reading text files line by line (CSV files)... Topic is solved

Are you writing your own components and need help with how to set them up or have questions about the components you are deriving from ? Ask them here.
Post Reply
fantaz
I live to help wx-kind
I live to help wx-kind
Posts: 169
Joined: Mon Sep 13, 2004 10:07 am
Location: Croatia

reading text files line by line (CSV files)...

Post by fantaz »

hey all,
i'm trying to write a piece of code which would read massive text files. I've found a piece of code on http://www.codeguru.com/Cpp/W-P/files/f ... php/c4477/.
Since i'd like to make this code wx compliant, I had transfered to wx like syntax. However, i'd like it to read unix, max and dos files line by line. So, how do i do it? The trick is in GetNextLine() method... I mean, how to incoporate code from wxTextbuff regarding EOL and maybe if the original author agrees incorporate it to wx?
Seems to me that it could be good thing for files with comma separated values, unless there is something already written and tested....

Example:

Code: Select all

#include "StringFile.h"
void SomeFun()
{
        CStringFile sfText;
	if(sfText.Open(filename))
	{
		// Read all the lines (one by one)
		while(sfText.GetNextLine(linestr)!=0)
		{
			wxLogMessage(linestr);
		}
		sfText.Close(); // Close the opened file
	}
}
StringFile.cpp...

Code: Select all

// StringFile.cpp: implementation of the CStringFile class.
//
//////////////////////////////////////////////////////////////////////
#include "wx/wx.h"
#include "wx/file.h"
#include "StringFile.h"


// =======================Function Description=======================
// 
// FUNCTION	:  CStringFile::CStringFile()
// 
// Description: This is the contructor for this class. It is possible
//	the set the size of the internal buffer in the constructor.
// ------------------------------------------------------------------
// ==================================================================
CStringFile::CStringFile(int nBufSize)
{
	m_dwRead = nBufSize;
	m_nBufferSize = nBufSize;
	m_pBuffer = new char[nBufSize];
	m_dwMasterIndex = 0;
	m_dwIndex = 0;
	m_dwLine = 0;
}

CStringFile::~CStringFile()
{
	delete m_pBuffer;
}

// =======================Function Description=======================
// 
// FUNCTION	:  CStringFile::Open()
// 
// Description:This function opens a file for reading. The file is 
//			opened with readonly reads...and shared only when read
// ------------------------------------------------------------------
bool CStringFile::Open(string szFile)
{
	if(!m_fFile.Open(szFile.c_str(), wxFile::read))
		return false;
	m_nMaxSize = (ssize_t)m_fFile.Length();
	return true;
}

// =======================Function Description=======================
// 
// FUNCTION	:  CStringFile::Close()
// 
// Description:Closes our previously opened file
// ------------------------------------------------------------------
// ==================================================================
void CStringFile::Close()
{
	m_fFile.Close();
	m_dwMasterIndex = 0;
	m_dwIndex = 0;
	m_dwLine = 0;
}

// =======================Function Description=======================
// 
// FUNCTION	:  CStringFile::GetNextLine()
// 
// Description	: Read the next line from a file
// 
// ------------------------------------------------------------------
// ==================================================================
size_t CStringFile::GetNextLine(wxString &szLine)
{
	char	*szBuffer;
	size_t	dwReturn;
	szBuffer = new char[m_nBufferSize];
	dwReturn = this->GetNextLine(szBuffer,m_nBufferSize);
	if(dwReturn != 0)
		szLine = szBuffer;
	else szLine = "";	//Empty
	delete szBuffer;
	return dwReturn;
}

// =======================Function Description=======================
// 
// FUNCTION	:  CStringFile::GetNextLine()
// 
// Description	: Read the next line from a file
// 
// ------------------------------------------------------------------
// ==================================================================
size_t CStringFile::GetNextLine(char* szLine, int iLineSize)
{
	char	*chTemp;
	bool	bStop=FALSE;
	int		nOut;

	chTemp =  szLine;
	*chTemp = 0;
	nOut = 0;
	while(!bStop)
	{
		if(!m_dwLine || m_dwIndex==m_dwRead)
		{
			m_dwMasterIndex = m_fFile.Seek(0,wxFromCurrent);
			m_dwRead=m_fFile.Read( m_pBuffer,m_nBufferSize);
			m_dwIndex = 0;
			if(m_dwRead == 0)
			{
				bStop = true; //Error during readfile or END-OF-FILE encountered
				if(nOut>0)
				{
					chTemp[nOut++] = (char) 0;
					return m_dwLine;	
				}
				else return m_dwLine = 0; //nix gelezen
			}
			else
			{
				if(m_dwRead !=  (size_t)m_nBufferSize)
					bStop = true;	//END-OF-FILE
			}
		}
		for(;m_dwIndex < m_dwRead;m_dwIndex++)
		{
			if((nOut+1) == iLineSize)
			{
				m_szError.Printf("m_pBuffer overflow in line %u (line length over %d chars)",++m_dwLine,iLineSize);
				wxLogWarning(m_szError);
				chTemp[nOut] = '\0';
				return m_dwLine;
			}
			switch(m_pBuffer[m_dwIndex])
			{
			case 0x0d://End of Line encountered
			case 0x0a:
				if((m_dwIndex+1) < m_dwRead) // Check we're not on end of m_pBuffer ???
					if(m_pBuffer[m_dwIndex+1] == '\n' || m_pBuffer[m_dwIndex+1] == '\r')
					{
						if(!*chTemp)
							m_dwLine++;
						m_dwIndex++;
					}
				if(*chTemp)
				{
					chTemp[nOut++] = '\0';
					m_dwLine++;
					return m_dwLine;
				}
				break;
			default: chTemp[nOut++] = m_pBuffer[m_dwIndex];
			}
		}
	}
	if(nOut>0)
	{
		chTemp[nOut++] = '\0';
		return m_dwLine;	
	}
	return m_dwLine = 0; //nix gelezen
}

// =======================Function Description=======================
// 
// FUNCTION	:  CStringFile::Reset()
// 
// Description:Reset the linecounter....this function effectively sets
//			the reading pointer to 0, so when the GetNextLine is 
//			executed it will start reading the first line
// ------------------------------------------------------------------
// ==================================================================
void CStringFile::Reset()
{
	m_dwIndex = 0;
	m_dwLine = 0;
	m_fFile.Seek(0,wxFromStart);
}
StringFile.h

Code: Select all

#ifndef _STRINGFILE_H_
#define _STRINGFILE_H_

#define SFBUF_SIZE	2048
#include <string>
using namespace std;

class CStringFile  
{
public:
	CStringFile(int nBufSize = SFBUF_SIZE);
	virtual ~CStringFile();
	bool Open(string szFile);
	void Close();
	void Reset(void);
	size_t GetNextLine(char* szLine,int iLineSize);
	size_t GetNextLine(wxString &szLine);

protected:
	int m_nBufferSize;
	wxString m_szError;
	ssize_t	m_nMaxSize;
	size_t	m_dwRead;
	size_t	m_dwLine;
	size_t	m_dwMasterIndex;
	size_t	m_dwIndex;
	int		m_nSectionCount;
	char	*m_pBuffer;
	wxFile	m_fFile;
};

#endif //_STRINGFILE_H_

llama9000
Knows some wx things
Knows some wx things
Posts: 30
Joined: Mon May 30, 2005 3:32 am

Post by llama9000 »

Hi there,

Have you tried wxTextFile + wxStringTokenizer? If CSVs are all that you'll parse these two should work wonderfully.

Being a lazy guy I use them all the time... just read the whole line and strip out the bits you want :D

http://www.wxwidgets.org/manuals/2.6.1/ ... wxtextfile
http://www.wxwidgets.org/manuals/2.6.1/ ... gtokenizer
fantaz
I live to help wx-kind
I live to help wx-kind
Posts: 169
Joined: Mon Sep 13, 2004 10:07 am
Location: Croatia

Post by fantaz »

yeah, i've considered wxTextFile, and i do plan to use wxStringTokenizer, but...
wxTextFile loads whole file into memory, which is not good for a text file of 20 MB. Also the documentation on wxTextFile clearly states:
One word of warning: the class is not at all optimized for big files and thus it will load the file entirely into memory when opened. Of course, you should not work in this way with large files (as an estimation, anything over 1 Megabyte is surely too big for this class). On the other hand, it is not a serious limitation for small files like configuration files or program sources which are well handled by wxTextFile.
What i tried to do here is:
1. Load the buffer

Code: Select all

wxFile        m_fFile;
           m_fFile.Read( m_pBuffer,m_nBufferSize);
2. Figure out the EOL sign
3. Return the line in a wxString obj and do something with it.
The problem is how to change the switch statement so the program could easily iterate through the Mac, *NIX, or DOS file.
Is the good thing to subclass wxTextBuffer, or something else?
phlox81
wxWorld Domination!
wxWorld Domination!
Posts: 1387
Joined: Thu Aug 18, 2005 7:49 pm
Location: Germany
Contact:

Post by phlox81 »

you can use std::map instead of a switch:

Code: Select all

std::map<wxString,wxString> eolmap;
eolmap["MSW"]="\r\n";
eolmap["UNIX"]="\n";
eolmap["MAC"]="\r";
//...
//somewhere else
SetEndOfLine(eolmap["MSW"]);
llama9000
Knows some wx things
Knows some wx things
Posts: 30
Joined: Mon May 30, 2005 3:32 am

Post by llama9000 »

Damn... missed the 'massive text file' part... I knew it was too easy to be true :)

What about streams? Like wxTextInputStream? It's got ReadLine() and stuff like that... you could use it with wxFileInputStream like the example suggests:

http://www.wxwidgets.org/manuals/2.6.1/ ... nputstream
fantaz
I live to help wx-kind
I live to help wx-kind
Posts: 169
Joined: Mon Sep 13, 2004 10:07 am
Location: Croatia

Post by fantaz »

Damn... missed the wxTextInputStream!
You're right, llama9000, this is the answer to all of my problems. In other words, the class i've been working on is completely obsolete. One has to take in the account only the data structure(i.e. comma, colon, semmicolon, space or tab separated values)... As I suspected, wxlib IS the best lib out there... Wow!!! big thanx...
llama9000
Knows some wx things
Knows some wx things
Posts: 30
Joined: Mon May 30, 2005 3:32 am

Post by llama9000 »

Glad it helps. Cheers :wink:
Post Reply