Have most of this working and i will add the code to the end of the for you all to look at.
Essentially what i am trying to do is tokenize a string that looks like:
My tokeizing stuff works just fine for simpler version of this but it screws up a little when it gets to the "special symbols" that are a combination of more then one special token. The special symbols are:
Now i obviously have to look for those as tokenizing points, but the problem comes when i get to double ones like &&, or, !=, etc.
I was wondering if there was a way to make it look for the combination != instead of just the ! and then the =.
If there is not an easy way to do it what would be the best and easiest way to go about parsing that string into tokens? I am kinda leaning towards string.split but am not quite sure on how to set it up. some examples or pointers would be welcome!!
Thanks and here is the code of the relevant part:
Essentially what i am trying to do is tokenize a string that looks like:
Code:
program int ABC, D;
begin read ABC; read D;
while (ABC != D) begin
if (ABC > D) then ABC = ABC - D;
else D = D - ABC;
end;
end;
write D;
end
My tokeizing stuff works just fine for simpler version of this but it screws up a little when it gets to the "special symbols" that are a combination of more then one special token. The special symbols are:
Code:
; , = ! [ ] && or ( ) + - * != == <= >=
Now i obviously have to look for those as tokenizing points, but the problem comes when i get to double ones like &&, or, !=, etc.
I was wondering if there was a way to make it look for the combination != instead of just the ! and then the =.
If there is not an easy way to do it what would be the best and easiest way to go about parsing that string into tokens? I am kinda leaning towards string.split but am not quite sure on how to set it up. some examples or pointers would be welcome!!
Thanks and here is the code of the relevant part:
Code:
import java.util.ArrayList;
import java.util.StringTokenizer;
import java.io.*;
/**
*
* @author Kyle Hiltner
*
*/
public class KHTokenizer implements KHTokenizerInterface
{
private String current_token; //used to specify current token
private int token_count=0; //used to keep track of which token is being asked for
private ArrayList<String> file = new ArrayList<String>(); //stores the parsed input file
/**
* Creates a new KHTokenizer with the name of the file as input
*
* @param inputFileName the specified file to be read from
* @throws IOException
*/
KHTokenizer(String inputFileName) throws IOException
{
FileReader freader = new FileReader(inputFileName); //create a FileReader for reading
BufferedReader inputFile = new BufferedReader(freader); //pass that FileReader to a BufferedReader
String theFile = Create_String_From_File(inputFile); //create a space separated string for easier tokenizing
StringTokenizer tokenized_input_file = new StringTokenizer(theFile, ";=,()[] ", true); //tokenize the string using ;, =, and " " as delimiters
String_Tokenizer(tokenized_input_file, file); //create the array by adding tokens
this.current_token = file.get(this.token_count); //set the current token to the first in the array
}
//--------------------------//
//----Private Operations----//
//-------------------------//
/**
* Determines if the specified word is a special Reserved word
*
* @param reserved_word the current token
* @return true if and only if the reserved_word is a Reserved Word
*/
private static Boolean Is_Reserved_Word(String reserved_word)
{
//determine is reserved_word is one the established Reserved Words
return ((reserved_word.equals("program")) || (reserved_word.equals("begin")) ||
(reserved_word.equals("end")) || (reserved_word.equals("int")) ||
(reserved_word.equals("if")) || (reserved_word.equals("then")) ||
(reserved_word.equals("else")) || (reserved_word.equals("while")) ||
(reserved_word.equals("read")) || (reserved_word.equals("write")));
}
/**
* Determines if the specified word is a Special Symbol
*
* @param special_symbol the current token
* @return true if and only if the special_symbol is a Special Symbol
*/
private static Boolean Is_Special_Symbol(String special_symbol)
{
//determines if special_symbol is one of the established Special Symbols
return ((special_symbol.equals(";")) || (special_symbol.equals(",")) ||
(special_symbol.equals("=")) || (special_symbol.equals("!")) ||
(special_symbol.equals("[")) || (special_symbol.equals("]")) ||
(special_symbol.equals("&&")) || (special_symbol.equals("or")) ||
(special_symbol.equals("(")) || (special_symbol.equals(")")) ||
(special_symbol.equals("+")) || (special_symbol.equals("-")) ||
(special_symbol.equals("*")) || (special_symbol.equals("!=")) ||
(special_symbol.equals("==")) || (special_symbol.equals("<">")) || (special_symbol.equals("<=")) ||
(special_symbol.equals(">=")));
}
/**
* Determines if the specified token is an integer
*
* @param integer_token the current token to be converted to an integer
* @return true is and only if integer_token is an integer
*/
private static Boolean Is_Integer(String integer_token)
{
Boolean is_integer=false; //set up boolean for check
//try to convert the specified string to an integer
try
{
int integer_token_value = Integer.parseInt(integer_token); //convert the string to an integer
is_integer = true; //set is_integer to true
}
catch(NumberFormatException e) //if unable to parse the string to an integer set is_integer to false
{
is_integer = false; //set is_integer to false
}
return is_integer; //return the integer
}
/**
* Determines if the specified token is an Identifier
*
* @param identifier_token the current token
* @return true if and only if the identifier_token is an identifier
*/
private static Boolean Is_Identifier(String identifier_token)
{
//rule out that it is a Reserved Word, Special Symbol, or integer so then it must be an Identifier; so return true or false
return ((!Is_Reserved_Word(identifier_token)) && (!Is_Special_Symbol(identifier_token)) && (!Is_Integer(identifier_token)));
}
/**
* Determines which value to assign to the specified token
*
* @param which_reserved_word_token the current token
* @return token_value the integer value relating to the Reserved Word token
*/
private static int Which_Reserved_Word(String which_reserved_word_token)
{
int token_value=0; //set initial token_value
//run through and check which Reserved word it is and then set it to the correct value
if(which_reserved_word_token.equals("program"))
{
token_value = ReservedWords.PROGRAM.ordinal()+1;
}
else if(which_reserved_word_token.equals("begin"))
{
token_value = ReservedWords.BEGIN.ordinal()+1;
}
else if(which_reserved_word_token.equals("end"))
{
token_value = ReservedWords.END.ordinal()+1;
}
else if(which_reserved_word_token.equals("int"))
{
token_value = ReservedWords.INT.ordinal()+1;
}
else if(which_reserved_word_token.equals("if"))
{
token_value = ReservedWords.IF.ordinal()+1;
}
else if(which_reserved_word_token.equals("then"))
{
token_value = ReservedWords.THEN.ordinal()+1;
}
else if(which_reserved_word_token.equals("else"))
{
token_value = ReservedWords.ELSE.ordinal()+1;
}
else if(which_reserved_word_token.equals("while"))
{
token_value = ReservedWords.WHILE.ordinal()+1;
}
else if(which_reserved_word_token.equals("read"))
{
token_value = ReservedWords.READ.ordinal()+1;
}
else
{
token_value = ReservedWords.WRITE.ordinal()+1;
}
return token_value; //return the token_value
}
/**
* Determines which value to assign to the specified token
*
* @param which_special_symbol_token the current token
* @return special_symbol_token_value the integer value relating to the Special Symbol token
*/
private static int Which_Special_Symbol(String which_special_symbol_token)
{
int special_symbol_token_value=0; //set initial value
//check to figure out which Special Symbol it is and assign the correct value
if(which_special_symbol_token.equals(";"))
{
special_symbol_token_value = SpecialSymbols.SEMICOLON.ordinal()+11;
}
else if(which_special_symbol_token.equals(","))
{
special_symbol_token_value = SpecialSymbols.COMMA.ordinal()+11;
}
else if(which_special_symbol_token.equals("="))
{
special_symbol_token_value = SpecialSymbols.EQUALS.ordinal()+11;
}
else if(which_special_symbol_token.equals("!"))
{
special_symbol_token_value = SpecialSymbols.EXCLAMATION_MARK.ordinal()+11;
}
else if(which_special_symbol_token.equals("["))
{
special_symbol_token_value = SpecialSymbols.LEFT_BRACKET.ordinal()+11;
}
else if(which_special_symbol_token.equals("]"))
{
special_symbol_token_value = SpecialSymbols.RIGHT_BRACKET.ordinal()+11;
}
else if(which_special_symbol_token.equals("&&"))
{
special_symbol_token_value = SpecialSymbols.AND.ordinal()+11;
}
else if(which_special_symbol_token.equals("or"))
{
special_symbol_token_value = SpecialSymbols.OR.ordinal()+11;
}
else if(which_special_symbol_token.equals("("))
{
special_symbol_token_value = SpecialSymbols.LEFT_PARENTHESIS.ordinal()+11;
}
else if(which_special_symbol_token.equals(")"))
{
special_symbol_token_value = SpecialSymbols.RIGHT_PARENTHESIS.ordinal()+11;
}
else if(which_special_symbol_token.equals("+"))
{
special_symbol_token_value = SpecialSymbols.PLUS.ordinal()+11;
}
else if(which_special_symbol_token.equals("-"))
{
special_symbol_token_value = SpecialSymbols.MINUS.ordinal()+11;
}
else if(which_special_symbol_token.equals("*"))
{
special_symbol_token_value = SpecialSymbols.MULTIPLY.ordinal()+11;
}
else if(which_special_symbol_token.equals("!="))
{
special_symbol_token_value = SpecialSymbols.NOT_EQUALS.ordinal()+11;
}
else if(which_special_symbol_token.equals("=="))
{
special_symbol_token_value = SpecialSymbols.EQUALS_EQUALS.ordinal()+11;
}
else if(which_special_symbol_token.equals("<">"))
{
special_symbol_token_value = SpecialSymbols.GREATER_THAN.ordinal()+11;
}
else if(which_special_symbol_token.equals("<="))
{
special_symbol_token_value = SpecialSymbols.LESS_THAN_OR_EQUAL_TO.ordinal()+11;
}
else
{
special_symbol_token_value = SpecialSymbols.GREATER_THAN_OR_EQUAL_TO.ordinal()+11;
}
return special_symbol_token_value; //return the correct value
}
/**
* Creates the string separated by white spaces to be read by the String Tokenizer
*
* @param input_file the stream to be converted into a string
* @return theFile the inputFile converted to a string
* @throws IOException
*/
private static String Create_String_From_File(BufferedReader input_file) throws IOException
{
String theFile="", keepReadingFromFile=""; //set initial value of the strings
//run through the stream and create a file
while(keepReadingFromFile != null)
{
keepReadingFromFile = input_file.readLine(); //read one line at a time
//if the line is null stop and break
if(keepReadingFromFile == null)
{
break;
}
else //keep reading from the file and make it into a string
{
theFile = theFile + keepReadingFromFile;
}
}
theFile = theFile.replaceAll("\\t", " "); //remove any tabs from the string and replace with spaces so it is easier to Tokenize
return theFile; //return the newly created string
}
/**
* Creates the array of tokens but tokenizing based on the given parameters
*
* @param theInputFile
* @param file to store the individual tokens in
*/
private void String_Tokenizer(StringTokenizer theInputFile, ArrayList<String> file)
{
String token=""; //set up the intial token
//keep reading with there is still more in the token stream
while (theInputFile.hasMoreTokens())
{
token = theInputFile.nextToken(); //set token to the next token
//if the token is not a white sapce then add it to the array
if(!token.equals(" "))
{
file.add(token); //add token to the array
}
}
file.add("nill"); //add a final spot to designate the end of the file
}
//--------------------------//
//----Public Operations-----//
//--------------------------//
/**
* Returns the integer value of the current token
*
* @return the integer value of the current token
*/
public int getToken()
{
int token_number=0; //set initial value
//determine if the current token is a Reserved Word, Special Symbol, Identifier, or nill (for end of file)
if(Is_Reserved_Word(this.current_token))
{
token_number = Which_Reserved_Word(this.current_token); //determine the correct value for the Reserved Word
}
else if(Is_Special_Symbol(this.current_token))
{
token_number = Which_Special_Symbol(this.current_token); //determine the correct value for the Special Symbol
}
else if(Is_Integer(this.current_token))
{
token_number = 30; //the current token is an integer so set it to 30
}
else if(this.current_token.equals("nill"))
{
token_number = 32; //the current token is nill so set it to 32
}
else//(Is_Identifier(this.current_token))
{
token_number = 31; //the token is an identifer so set it to 31
}
return token_number; //return the token_number
}
/**
* Sets the current token as the next one in line
*/
public void skipToken()
{
//keep getting the next token as long as token_count is less then the size of the array
if(this.token_count < file.size()-1)
{
this.token_count++; //increase token_count
this.current_token = file.get(token_count); // get the new token
}
}
/**
* This method can only be called to convert an integer in string form to its integer value.
* If called on an non integer token an error is printed to the screen and execution of the Tokenizer is stopped.
*
* @return integer value of the specified token assuming the token is an integer
*/
public int intVal()
{
int integer_token_value=0; //set the initial value
//if true is returned then go ahead and convert
if(Is_Integer(this.current_token))
{
integer_token_value = Integer.parseInt(this.current_token); //parse the current_token string and get an integer value
}
else // print he error message and exit Tokenizing
{
System.out.print("You called intVal() on a non-integer token. You tryed to convert the " );
if(Is_Reserved_Word(this.current_token))
{
System.out.print("reserved word " + "\"" + this.current_token +"\"" + " to an integer");
}
else if(Is_Special_Symbol(this.current_token))
{
System.out.print("special symbol " + "\"" + this.current_token +"\"" + " to an integer");
}
else
{
System.out.print("identifier " + "\"" + this.current_token +"\"" + " to an integer");
}
System.exit(1); //exit the system and quit tokenizing
}
return integer_token_value; //return the current_token integer value
}
/**
* Returns a string if and only if the token is of the id type.
*
* @return the name of the id token
*/
public String idName()
{
String id_token_name=""; //setup the initial value
//if the current_token is an Identifer then set it so and return it.
if(Is_Identifier(this.current_token))
{
id_token_name = this.current_token;
}
else // print message and quit tokenizing
{
System.out.print("You called idName() on ");
if(Is_Reserved_Word(this.current_token))
{
System.out.print("a reserved word, ");
}
else if(Is_Special_Symbol(this.current_token))
{
System.out.print("a special symbol, ");
}
else
{
System.out.print("an integer, ");
}
System.out.println("which is not an identifier token.");
System.exit(1); //exit and quit tokenizing
}
return id_token_name; //return the id_token_name if possible
}
}