Issue
I'm working through the first exercise in John Crickett's coding challenges, which is to create a wc clone that counts the number of lines, bytes, words and characters in a text file. I'm on Step 4, counting characters.
My code so far is as follows:
public static long countChars(File inputFile) {
long count = 0;
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile))) {
String line;
while ((line = reader.readLine()) != null)
{
line = line.replaceAll("\uFEFF",""); // Remove BOM
count += line.length();
}
} catch (IOException e) {
e.printStackTrace();
}
return count;
}
The issue is that wc returns an output of 339292, whereas my code is returning 325001. My initial suspicion was that my code was simply ignoring line break characters, and I noticed something interesting. The difference between the output of wc and my own code's output is 14291 missing characters, which is double the number of lines plus one.
I am trying to understand the following:
- Why are there two line breaks per line rather than per one? Surely every line has a line break at the end of it?
- What is the extra single character?
- Is it naive to simply double the line count and add it onto the character count (and then +1)? Will this trip me up in some edge case somehow?
Solution
It seems wc is counting \r and \n as a character and you're not.
Add to that the BOM you are ignoring and it gives the difference you are seeing.
You should probably not be reading by line if you want to count the \r and the \n. If that's the case, I would read by character, and then keep a state every time you read a \r to ignore the following \n to increment the number of lines. Actually, it seems you are making several passes, one for the lines, another for the characters, and I guess another for the words.
So you just need this:
public static long countChars(File inputFile) {
long count = 0;
try (Reader reader = new FileReader(inputFile)) {
int r;
while ((r = reader.read()) > 0)
{
count ++;
}
} catch (IOException e) {
e.printStackTrace();
}
return count;
}
But you could solve it in one pass with the suggestion I made about the state for saying the last read was a \r for not counting the following \n.
Answered By - Andrés Alcarraz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.