reading text files line by line (CSV files)...  [SOLVED]

Are you writing your own components and need help with how to set them up or have questions about the components you are deriving from ? Ask them here.

reading text files line by line (CSV files)...

Postby fantaz » Mon Sep 26, 2005 1:10 pm

hey all,
i'm trying to write a piece of code which would read massive text files. I've found a piece of code on http://www.codeguru.com/Cpp/W-P/files/fileio/article.php/c4477/.
Since i'd like to make this code wx compliant, I had transfered to wx like syntax. However, i'd like it to read unix, max and dos files line by line. So, how do i do it? The trick is in GetNextLine() method... I mean, how to incoporate code from wxTextbuff regarding EOL and maybe if the original author agrees incorporate it to wx?
Seems to me that it could be good thing for files with comma separated values, unless there is something already written and tested....

Example:
Code: Select all
#include "StringFile.h"
void SomeFun()
{
        CStringFile sfText;
   if(sfText.Open(filename))
   {
      // Read all the lines (one by one)
      while(sfText.GetNextLine(linestr)!=0)
      {
         wxLogMessage(linestr);
      }
      sfText.Close(); // Close the opened file
   }
}


StringFile.cpp...
Code: Select all
// StringFile.cpp: implementation of the CStringFile class.
//
//////////////////////////////////////////////////////////////////////
#include "wx/wx.h"
#include "wx/file.h"
#include "StringFile.h"


// =======================Function Description=======================
//
// FUNCTION   :  CStringFile::CStringFile()
//
// Description: This is the contructor for this class. It is possible
//   the set the size of the internal buffer in the constructor.
// ------------------------------------------------------------------
// ==================================================================
CStringFile::CStringFile(int nBufSize)
{
   m_dwRead = nBufSize;
   m_nBufferSize = nBufSize;
   m_pBuffer = new char[nBufSize];
   m_dwMasterIndex = 0;
   m_dwIndex = 0;
   m_dwLine = 0;
}

CStringFile::~CStringFile()
{
   delete m_pBuffer;
}

// =======================Function Description=======================
//
// FUNCTION   :  CStringFile::Open()
//
// Description:This function opens a file for reading. The file is
//         opened with readonly reads...and shared only when read
// ------------------------------------------------------------------
bool CStringFile::Open(string szFile)
{
   if(!m_fFile.Open(szFile.c_str(), wxFile::read))
      return false;
   m_nMaxSize = (ssize_t)m_fFile.Length();
   return true;
}

// =======================Function Description=======================
//
// FUNCTION   :  CStringFile::Close()
//
// Description:Closes our previously opened file
// ------------------------------------------------------------------
// ==================================================================
void CStringFile::Close()
{
   m_fFile.Close();
   m_dwMasterIndex = 0;
   m_dwIndex = 0;
   m_dwLine = 0;
}

// =======================Function Description=======================
//
// FUNCTION   :  CStringFile::GetNextLine()
//
// Description   : Read the next line from a file
//
// ------------------------------------------------------------------
// ==================================================================
size_t CStringFile::GetNextLine(wxString &szLine)
{
   char   *szBuffer;
   size_t   dwReturn;
   szBuffer = new char[m_nBufferSize];
   dwReturn = this->GetNextLine(szBuffer,m_nBufferSize);
   if(dwReturn != 0)
      szLine = szBuffer;
   else szLine = "";   //Empty
   delete szBuffer;
   return dwReturn;
}

// =======================Function Description=======================
//
// FUNCTION   :  CStringFile::GetNextLine()
//
// Description   : Read the next line from a file
//
// ------------------------------------------------------------------
// ==================================================================
size_t CStringFile::GetNextLine(char* szLine, int iLineSize)
{
   char   *chTemp;
   bool   bStop=FALSE;
   int      nOut;

   chTemp =  szLine;
   *chTemp = 0;
   nOut = 0;
   while(!bStop)
   {
      if(!m_dwLine || m_dwIndex==m_dwRead)
      {
         m_dwMasterIndex = m_fFile.Seek(0,wxFromCurrent);
         m_dwRead=m_fFile.Read( m_pBuffer,m_nBufferSize);
         m_dwIndex = 0;
         if(m_dwRead == 0)
         {
            bStop = true; //Error during readfile or END-OF-FILE encountered
            if(nOut>0)
            {
               chTemp[nOut++] = (char) 0;
               return m_dwLine;   
            }
            else return m_dwLine = 0; //nix gelezen
         }
         else
         {
            if(m_dwRead !=  (size_t)m_nBufferSize)
               bStop = true;   //END-OF-FILE
         }
      }
      for(;m_dwIndex < m_dwRead;m_dwIndex++)
      {
         if((nOut+1) == iLineSize)
         {
            m_szError.Printf("m_pBuffer overflow in line %u (line length over %d chars)",++m_dwLine,iLineSize);
            wxLogWarning(m_szError);
            chTemp[nOut] = '\0';
            return m_dwLine;
         }
         switch(m_pBuffer[m_dwIndex])
         {
         case 0x0d://End of Line encountered
         case 0x0a:
            if((m_dwIndex+1) < m_dwRead) // Check we're not on end of m_pBuffer ???
               if(m_pBuffer[m_dwIndex+1] == '\n' || m_pBuffer[m_dwIndex+1] == '\r')
               {
                  if(!*chTemp)
                     m_dwLine++;
                  m_dwIndex++;
               }
            if(*chTemp)
            {
               chTemp[nOut++] = '\0';
               m_dwLine++;
               return m_dwLine;
            }
            break;
         default: chTemp[nOut++] = m_pBuffer[m_dwIndex];
         }
      }
   }
   if(nOut>0)
   {
      chTemp[nOut++] = '\0';
      return m_dwLine;   
   }
   return m_dwLine = 0; //nix gelezen
}

// =======================Function Description=======================
//
// FUNCTION   :  CStringFile::Reset()
//
// Description:Reset the linecounter....this function effectively sets
//         the reading pointer to 0, so when the GetNextLine is
//         executed it will start reading the first line
// ------------------------------------------------------------------
// ==================================================================
void CStringFile::Reset()
{
   m_dwIndex = 0;
   m_dwLine = 0;
   m_fFile.Seek(0,wxFromStart);
}


StringFile.h

Code: Select all
#ifndef _STRINGFILE_H_
#define _STRINGFILE_H_

#define SFBUF_SIZE   2048
#include <string>
using namespace std;

class CStringFile 
{
public:
   CStringFile(int nBufSize = SFBUF_SIZE);
   virtual ~CStringFile();
   bool Open(string szFile);
   void Close();
   void Reset(void);
   size_t GetNextLine(char* szLine,int iLineSize);
   size_t GetNextLine(wxString &szLine);

protected:
   int m_nBufferSize;
   wxString m_szError;
   ssize_t   m_nMaxSize;
   size_t   m_dwRead;
   size_t   m_dwLine;
   size_t   m_dwMasterIndex;
   size_t   m_dwIndex;
   int      m_nSectionCount;
   char   *m_pBuffer;
   wxFile   m_fFile;
};

#endif //_STRINGFILE_H_

User avatar
fantaz
I live to help wx-kind
I live to help wx-kind
 
Posts: 168
Joined: Mon Sep 13, 2004 10:07 am
Location: Croatia

Postby llama9000 » Wed Sep 28, 2005 4:52 am

Hi there,

Have you tried wxTextFile + wxStringTokenizer? If CSVs are all that you'll parse these two should work wonderfully.

Being a lazy guy I use them all the time... just read the whole line and strip out the bits you want :D

http://www.wxwidgets.org/manuals/2.6.1/wx_wxtextfile.html#wxtextfile
http://www.wxwidgets.org/manuals/2.6.1/wx_wxstringtokenizer.html#wxstringtokenizer
llama9000
Knows some wx things
Knows some wx things
 
Posts: 30
Joined: Mon May 30, 2005 3:32 am

Postby fantaz » Wed Sep 28, 2005 8:57 am

yeah, i've considered wxTextFile, and i do plan to use wxStringTokenizer, but...
wxTextFile loads whole file into memory, which is not good for a text file of 20 MB. Also the documentation on wxTextFile clearly states:
One word of warning: the class is not at all optimized for big files and thus it will load the file entirely into memory when opened. Of course, you should not work in this way with large files (as an estimation, anything over 1 Megabyte is surely too big for this class). On the other hand, it is not a serious limitation for small files like configuration files or program sources which are well handled by wxTextFile.

What i tried to do here is:
1. Load the buffer
Code: Select all
wxFile        m_fFile;
           m_fFile.Read( m_pBuffer,m_nBufferSize);

2. Figure out the EOL sign
3. Return the line in a wxString obj and do something with it.
The problem is how to change the switch statement so the program could easily iterate through the Mac, *NIX, or DOS file.
Is the good thing to subclass wxTextBuffer, or something else?
User avatar
fantaz
I live to help wx-kind
I live to help wx-kind
 
Posts: 168
Joined: Mon Sep 13, 2004 10:07 am
Location: Croatia

Postby phlox81 » Wed Sep 28, 2005 9:06 am

you can use std::map instead of a switch:

Code: Select all
std::map<wxString,wxString> eolmap;
eolmap["MSW"]="\r\n";
eolmap["UNIX"]="\n";
eolmap["MAC"]="\r";
//...
//somewhere else
SetEndOfLine(eolmap["MSW"]);
phlox81
wxWorld Domination!
wxWorld Domination!
 
Posts: 1387
Joined: Thu Aug 18, 2005 7:49 pm
Location: Germany

  [SOLVED]

Postby llama9000 » Wed Sep 28, 2005 10:56 am

Damn... missed the 'massive text file' part... I knew it was too easy to be true :)

What about streams? Like wxTextInputStream? It's got ReadLine() and stuff like that... you could use it with wxFileInputStream like the example suggests:

http://www.wxwidgets.org/manuals/2.6.1/wx_wxtextinputstream.html#wxtextinputstream
llama9000
Knows some wx things
Knows some wx things
 
Posts: 30
Joined: Mon May 30, 2005 3:32 am

Postby fantaz » Wed Sep 28, 2005 11:45 am

Damn... missed the wxTextInputStream!
You're right, llama9000, this is the answer to all of my problems. In other words, the class i've been working on is completely obsolete. One has to take in the account only the data structure(i.e. comma, colon, semmicolon, space or tab separated values)... As I suspected, wxlib IS the best lib out there... Wow!!! big thanx...
User avatar
fantaz
I live to help wx-kind
I live to help wx-kind
 
Posts: 168
Joined: Mon Sep 13, 2004 10:07 am
Location: Croatia

Postby llama9000 » Wed Sep 28, 2005 12:02 pm

Glad it helps. Cheers :wink:
llama9000
Knows some wx things
Knows some wx things
 
Posts: 30
Joined: Mon May 30, 2005 3:32 am


Return to Component Writing

Who is online

Users browsing this forum: Yahoo [Bot] and 3 guests