Page 1 of 2

Read text file backwards

Posted: Thu Sep 25, 2008 12:27 am
by Marcus Frenkel
Hello All:

I'm trying to find a way to read a big text file backwards from the EOF, line by line, without loading the whole file into memory first. wxTextOutputStream is good for reading a file line by line in the case when we need to start at the begging of the file. I'm having a hard time finding a way to do it backwards and I hope somebody will help me.

Marcus

Posted: Thu Sep 25, 2008 8:56 am
by doublemax
http://docs.wxwidgets.org/stable/wx_wxt ... wxtextfile will let you access individual lines.

However, internally the class will read the whole file, so for big files it might be slow, and for *really* big files even unusable.

Posted: Thu Sep 25, 2008 9:16 am
by Frank
Has wx a wrapper for memory mapped files? If so, I would use these.

Posted: Thu Sep 25, 2008 10:44 am
by Marcus Frenkel
Thanks for the thoughts. Yeah, wxTextFile has very nice functions but I will really need a solution for big log files without loading them into memory first. The thing is that depending on the needed date span I will need to get let say only the 30 last lines of the file, for example the logged details from the last week. The loop should start from EOF and then check the date contained on the line and proceed backwards if needed.

The other way around is to write the new text into this file on the top instead at the bottom - insert instead of append, and then read it line by line from top to bottom. But I think there is no really a way to insert text at the top of the file without loading it into memory or making temporarily files that will be around the same size of the original file.

Marcus

Posted: Thu Sep 25, 2008 11:02 am
by Romas
There is no fancy way to read files forward or backward line by line. The only thing is the algorithm which will differ.

What I would do in your way? I would create an array of ints (they would show the offset of lines in file). Parse the file as usual to see where each line lies in file. Then iterate the array backward, jump to file offset and read the line. In the result you will get what you want.

Posted: Thu Sep 25, 2008 11:21 am
by Marcus Frenkel
Hi Romans, thanks for the input. I don't really understand what you suggest. How can I see where each line lies in file or how can I count the number of lines in a file without loading the contents into memory?

Marcus

Posted: Thu Sep 25, 2008 11:23 am
by doublemax
i don't know any class that would do what you need out of the box.

If you have control over the actual logging process you could try logging into separate files (e.g. one for each day) or logging into a database.

If not, you could check the source of the unix "tail" command and see how they do it.

Or write something yourself that reads blocks backwards from the end of the file, searches for line-ends and extracts the lines you need. Handling the lines overlapping block boundaries might be a little tricky, but certainly no rocket science :)

Based on your knowledge on the log data, it could be simplified by reading so much data that you know it will contain all lines you're looking for. If each log line has an almost constant length and the amount of log entries per day/hour can be estimated.

E.g you know that each log entry has about 50 chars and you get an average of 1000 log entries per day, double it for safety, then you just read 100k from the end of the file and the data you need is probably already completey read in.

Posted: Thu Sep 25, 2008 11:45 am
by Marcus Frenkel
The problem is that the log entry has variable length. I'll try to write something on my own using C++ file I/O functions even though I'm not really good (enough) in this field. Probable solution would be to read character by character backwards from EOF by decreasing the bytes each time and scan for \n or other terminating character like ";". I'll post back if I have some working code.

Marcus

Posted: Thu Sep 25, 2008 11:55 am
by Romas
hello,

Marcus, I am Romas :P

Ok, even in STL, you can get file line by line. Almost every compiler has its STL port, so you don't loose multiplatform feature. If I am right, the function is called "getline". It uses stream as input. So, you do not need to read char by char to look is it the end of file. And to mention that, reading char by char will slow things down. In best case, if file is not huge, you can read it in memory and then analize memory, in worst case, you can read file chunks and analize them. I will try to write what I am saying by the end of day :)

Posted: Thu Sep 25, 2008 12:27 pm
by Romas
Ok, I am impatient person :) Here is the code (it is not connected with wxWidgets, but it works):

Code: Select all

    const int bufferSize = 1024 * 9;
    char buffer [bufferSize];

    vector<SLineInfo> nLines;
    ifstream file ("C:\\customfix.bak");

    // find all the lines and their offset
    while (!file.eof ())
    {
        SLineInfo sli;
        sli.iOffset = file.tellg ();
        file.getline (buffer, bufferSize);
        sli.iLength = file.gcount ();

        nLines.push_back (sli);
    } //while

    // clear eof bit
    file.clear ();

    // print them backward
    //for (vector<SLineInfo>::reverse_iterator it = nLines.rbegin (); it != nLines.rend (); ++it)
    for (vector<SLineInfo>::reverse_iterator it = nLines.rbegin (); it != nLines.rend (); ++it)
    {
        // clear eof bit
        file.clear ();
        file.seekg ((*it).iOffset, ios_base::beg);
        file.getline (buffer, bufferSize);

        cout << buffer << endl;
    } //for

    file.close ();
file.clear is needed, because when you reach eof, the state is set and further reading is stoped, but feature, but you can improve. The code is written in few minutes, so don't be angry of misseen bugs :P
Cheers.[/code]

Posted: Thu Sep 25, 2008 3:46 pm
by Auria
YOur query remins me of LMX : http://adiumx.com/blog/category/lmx/

In any case, i'm pretty sure you'll to come up with your own custom solution

Posted: Mon Sep 29, 2008 10:32 am
by Marcus Frenkel
Thank you Romas for writing the code. It has some bugs I'll try to fix them and post back. I'll try to use wxHashMap because when I use vector I get error: <'SLineInfo' : undeclared identifier> even though I have

Code: Select all

#include <iostream>
#include <vector>
#include <string>
using namespace std;


Just one thing to be sure. Using your code does it mean that only one line is loaded into memory or when the loop is done all of the text on all the lines is loaded into memory?

Marcus

Posted: Mon Sep 29, 2008 12:40 pm
by Romas
Hello Marcus,

Oops, my mistake, I didn't paste all code :) Sorry for that.

struct SLineInfo
{
int iLength;
int iOffset;
};

The program has only one line per loop (but you can change it in your way). It holds only offset to file and line lengths. I hate to write programs that uses a lot of memory: just imagine if your file is 10 megs! :) Cheers.

Posted: Mon Sep 29, 2008 1:29 pm
by Marcus Frenkel
I add that to the code and I don't get compilation error now. Now when I try to get the text from each line in the 2nd loop I get empty string or sometimes one "random" character from the text:

Code: Select all

...
file.seekg ((*it).iOffset, ios_base::beg);
file.getline (buffer, bufferSize);

wxString mystring(buffer, wxConvUTF8);
wxMessageBox(mystring, wxT(""),wxYES_NO | wxCANCEL, frame);
...
What I'm missing? The text file has only text characters.

Thanks again, Marcus

Posted: Mon Sep 29, 2008 1:49 pm
by Romas
Did you remove file.clear (if yes, put it back)? On what platform are you working?

My code runs on windows Xp and works as it is intented to work. Can you debug the program? :)