string & stringstream issue/Q

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
BusterT
Knows some wx things
Knows some wx things
Posts: 31
Joined: Fri Jan 01, 2010 12:07 am

string & stringstream issue/Q

Post by BusterT »

ok, maybe I will have better luck with this here,


I'm trying to improve some file loading & have the following;

Code: Select all

        fHandle.seekg(0,std::ios::end); ////fHandle is fstream.
        std::streampos length = fHandle.tellg();
        fHandle.seekg(0,std::ios::beg);

        vector<char> buffer(length);
        fHandle.read(&buffer[0],length);

        stringstream streambuffer;
        //streambuffer.rdbuf()->pubsetbuf(&buffer[0],length); //1.0 crashes with this on large files. Memory usage also increases substantially.

        //2.0 compiles & loads file except has issue with extracting data beyond first lot of bytes.
        string stringtemp = &buffer[0];
        streambuffer.str(stringtemp);


//data extraction
	unsigned int Data = 0;
	int filepos = 0;
	streambuffer.seekg(filepos);
        streambuffer.read((char*)&Data,4);
//data beyond this can not get extracted? i.e. following will not extract data properly.

	filepos = 4;
	streambuffer.seekg(filepos);
        streambuffer.read((char*)&Data,4);

Problem is, one (1) alternative crashes on large files. The second (2) method has an issue with extracting data. So I'm wondering what am I doing wrong; or even, if there's a better/more efficient way of handling this.

Any help would be great, thanks!
Auria
Site Admin
Site Admin
Posts: 6695
Joined: Thu Sep 28, 2006 12:23 am
Contact:

Post by Auria »

Hi,

how large is "large"?
If you load a 200MB file into a string buffer, of course memory usage will increase significantly.

Also, generally, string buffers are not designed for huge files, maybe this is the cause for the crash

Regarding the second example, of course if you read a few bytes at a time this will need to happen in a loop
"Keyboard not detected. Press F1 to continue"
-- Windows
BusterT
Knows some wx things
Knows some wx things
Posts: 31
Joined: Fri Jan 01, 2010 12:07 am

Post by BusterT »

700MB+.
string buffers have a limitation on them :?


About reading few bytes at a time: as I said, it is not working:

In the above code, filepos of the streambuffer is moved, hence the .read operation should be reading accordingly, but it's not. And neither is any further read operations working when filepos is moved accordingly. So loop isn't the problem (least I can not see how it is relevant).

I have another similar version that works with no problem, but I'm trying to improve file loading by putting the data from disk to buffer so it doesn't take so long to load.
Auria
Site Admin
Site Admin
Posts: 6695
Joined: Thu Sep 28, 2006 12:23 am
Contact:

Post by Auria »

Please show us the code you used with the loop, the code you posted does not contain any loop so we can't comment on the version with the loop you tried.
Generally, you do not need to manually seek since the read pointer is advanced automatically.

String streams are inappropriate for binary data first and foremost because (IIRC) they rely on '\0' being the end of the string, so as first as a byte worth 0 is read from the file it would stop. Then, there is the additional problem that it doesn't know the size in advance; so it will start with a small buffer; then when the small buffer is filled, it will allocate a bigger buffer and copy the data in the bigger data; etc. and on and so forth. This will be slow and per force consume over the base 700MB. If your OS is tight on RAM, maybe it just fails to find a 700MB contiguous memory block
"Keyboard not detected. Press F1 to continue"
-- Windows
BusterT
Knows some wx things
Knows some wx things
Posts: 31
Joined: Fri Jan 01, 2010 12:07 am

Post by BusterT »

Ok, thank you for the explanation. May be reason why crashes on large files.

There's really to much code to post. My initial post outlined sample code of the problem, there is no loop in it since I need to manually move seek position due to nature of the data.


All this aside though, what would be the best approach in dealing with:
1. put file into buffer, so don't have to constantly read from disk;
2. not crash (or rather, less limitation as with string stream?)
3. not necessitate loop, as it's not ideal with what I'm doing. i.e. data is not really consistent & requires some manual intervention far as accessing data is concerned.


Essentially, just looking at a way to load all the data to buffer, and then access this buffer, instead of accessing disk. I just seem to be hitting brick wall no matter which way I attempt this.


Thank you once again!
Auria
Site Admin
Site Admin
Posts: 6695
Joined: Thu Sep 28, 2006 12:23 am
Contact:

Post by Auria »

BusterT wrote:Ok, thank you for the explanation. May be reason why crashes on large files.

There's really to much code to post. My initial post outlined sample code of the problem, there is no loop in it since I need to manually move seek position due to nature of the data.


All this aside though, what would be the best approach in dealing with:
1. put file into buffer, so don't have to constantly read from disk;
2. not crash (or rather, less limitation as with string stream?)
3. not necessitate loop, as it's not ideal with what I'm doing. i.e. data is not really consistent & requires some manual intervention far as accessing data is concerned.


Essentially, just looking at a way to load all the data to buffer, and then access this buffer, instead of accessing disk. I just seem to be hitting brick wall no matter which way I attempt this.


Thank you once again!
Loading the entire file into a memory buffer should be possible if the target computer has enough free contiguous RAM. The idea would be that you should not use any string class to handle binary data. Now I don't know if your file contains string or binary data; I suspect binary data so a string class would not be appropriate.
A simple way to load the data in a memory buffer would be to get the size; then allocate a char* buffer of the right size; then read bit by bit the file, in a loop, filling the char* buffer.

That's one way to do it, of course there may be others. I just recommend avoiding any class that dynamically increases its size as needed, for 700MB using dynamically resizing components will not work well.
"Keyboard not detected. Press F1 to continue"
-- Windows
briceandre
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 672
Joined: Tue Aug 31, 2010 6:22 am
Location: Belgium

Post by briceandre »

Loading the entire file into a memory buffer should be possible if the target computer has enough free contiguous RAM
Sorry, but I do not agree : modern computers work with virtual memory adress space and swap partition. So, what you need is enough contiguous virtual memory (which generally is not a problem), and enough RAM+swap (taking into account segmentation mechanism).
I just recommend avoiding any class that dynamically increases its size as needed, for 700MB using dynamically resizing components will not work well.
I do not see why. Well written resizing components double their reserved capacity each time it is not enough. So, 700Mb does not require so much resizing operations, even if you start from a very small reserved capacity. And, in general, you can specify this reserved capacity size so that no resize operation occurs during insertion.

If you enjoy playing with STL, you can use, for example, a std::vector of unsigned chars, and reserve a capacity equals to the expected size of the file. You will then be still able to define streams on it and play with all this wonderful stuff :-)

But I agree with Auria that, for what should be done here, using STL is maybe not necessary...
I suspect binary data so a string class would not be appropriate.
I 100% agree with Auria on that point : a string is for text. you should use a char buffer for binary data.
BusterT
Knows some wx things
Knows some wx things
Posts: 31
Joined: Fri Jan 01, 2010 12:07 am

Post by BusterT »

Ok, starting to make sense.
still learning, especially file I/O.


So essentially, for binary data just read it into char *buffer.
Is there any existing class/way to tie/redirect a char *buffer to a filestream (or other?) after reading in a whole file, like you can with stringstream, so one can use likes of .get/read/write etc on the stream
for example

Code: Select all

        char *buffer = new char[filesize];
        fHandle.read(&buffer[0],filesize);

        type? buffer2;
        buffer2.rdbuf()->pubsetbuf((char*)&buffer[0],filesize);
        buffer2.get(char variable,n); //or whatever else
Or do you specifically have to implement a class to handle such things yourself?


I have been looking at http://www.cplusplus.com/reference/iostream/ but I can't seem to find anything that allows for such redirection on char arrays.




Once again, thank you both.
PB
Part Of The Furniture
Part Of The Furniture
Posts: 4204
Joined: Sun Jan 03, 2010 5:45 pm

Post by PB »

I also don't think you're doing it the best way, anyway: did you check out wxMemory[Input/Output]Stream?

http://docs.wxwidgets.org/trunk/classwx ... tream.html
http://docs.wxwidgets.org/trunk/classwx ... tream.html
This also might be useful:
http://docs.wxwidgets.org/trunk/classwx ... _base.html
briceandre
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 672
Joined: Tue Aug 31, 2010 6:22 am
Location: Belgium

Post by briceandre »

I am not an STL expert, but I don't think you will be able to use stream on a simple char buffer. In this case, you should envisage using a std::vector.

Even with a file of 700Mb, for me, it should work. But it can ve a good idea, either to provide it your own allocator mechanism, of to reserve enough space at creation.

Look at this link. It shows how to play with files, iterators, and vectors :
http://www.decompile.com/cpp/tips/istream_copy_file.htm
Auria
Site Admin
Site Admin
Posts: 6695
Joined: Thu Sep 28, 2006 12:23 am
Contact:

Post by Auria »

briceandre, agreed, I should not have said "contiguous". I guess a more correct wording would be "sufficient free RAM to not start swapping like mad"

And my recommendation to avoid auto-resizing components was simply because when you load a file you generally can get the file size and thus avoid the resizings and copies. I agree it would still work, though directly allocating should be faster and also use less RAM. (700 MB with doubling = it may reach 450 MB, then double to 700 MB, which means temporarily 1150 MB will be used, which can be sufficient to make the computer run out of RAM and start swapping)
"Keyboard not detected. Press F1 to continue"
-- Windows
BusterT
Knows some wx things
Knows some wx things
Posts: 31
Joined: Fri Jan 01, 2010 12:07 am

Post by BusterT »

Thank you for the replies. Looks like I'll have to do some more reading. Will give these a go & see how I go.
Thanks!
Post Reply