Number of lines of a file Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
pbneves
Experienced Solver
Experienced Solver
Posts: 86
Joined: Fri Feb 15, 2019 11:37 am

Number of lines of a file

Post by pbneves »

Hi,

I'm using this code to find the number of lines of a file (the files can be a huge in order of 100 Mb), so I've used a WxFileInputStream.

Code: Select all

wxFileInputStream input("file.xml");
wxTextInputStream text(input, wxT("\x09"), wxConvUTF8 );
while(input.IsOk() && !input.Eof() )
{
	  wxString line=text.ReadLine();
	  numLines++;
}
cout << numLines;
Why is reporting a wrong number of lines of the file? The code is reporting more lines than reality. In a file with 2734 lines the code reports 2875lines.
Thanks,
PB
Part Of The Furniture
Part Of The Furniture
Posts: 4193
Joined: Sun Jan 03, 2010 5:45 pm

Re: Number of lines of a file

Post by PB »

End of lines are tricky as they vary between platforms, but the docs states
The wxTextInputStream correctly reads text files (or streams) in DOS, Macintosh and Unix formats and reports a single newline char as a line ending.
However, I do not see any way to retrieve the EOL type wxTextInputStream uses, and it is not documented how it decides which one to use.
It seems that EOLs are processed with this code

Code: Select all

bool wxTextInputStream::EatEOL(const wxChar &c)
{
    if (c == wxT('\n')) return true; // eat on UNIX

    if (c == wxT('\r')) // eat on both Mac and DOS
    {
        wxChar c2 = GetChar();
        if (!c2) return true; // end of stream reached, had enough :-)

        if (c2 != wxT('\n')) UngetLast(); // Don't eat on Mac
        return true;
    }

    return false;
} 
Out of curiosity, if you open the file with wxTextFile, does its GetLineCount() return the number you think it should?

I know it sounds silly but I would also check if numLines is initialized to 0. Also, just to be sure , I would change

Code: Select all

while(input.IsOk() && !input.Eof() )
to

Code: Select all

while(text.IsOk() && !text.Eof() )
pbneves
Experienced Solver
Experienced Solver
Posts: 86
Joined: Fri Feb 15, 2019 11:37 am

Re: Number of lines of a file

Post by pbneves »

I'm seeing it wrong or wxTextInputStream doesn't have the methods IsOk() and Eof()?
PB
Part Of The Furniture
Part Of The Furniture
Posts: 4193
Joined: Sun Jan 03, 2010 5:45 pm

Re: Number of lines of a file

Post by PB »

It inherits those methods.
pbneves
Experienced Solver
Experienced Solver
Posts: 86
Joined: Fri Feb 15, 2019 11:37 am

Re: Number of lines of a file

Post by pbneves »

Solved,

I've created an outup file based on line and compared with the original file.
The problem was that line.ReadLine() stoped reading and inserted an EOL on portuguese special chars like "Á".
Changing

Code: Select all

wxTextInputStream text(input, wxT("\x09"), wxConvUTF8 );
to

Code: Select all

wxTextInputStream text(input);
solved the problem and started reporting the correct line number.
Thanks
PB
Part Of The Furniture
Part Of The Furniture
Posts: 4193
Joined: Sun Jan 03, 2010 5:45 pm

Re: Number of lines of a file

Post by PB »

Ah, so the issue was that encoding of that XML file was not UTF-8.
Post Reply