Reading a file and storing the contents in an array

Question

This is my code for reading a file with a delimiter. Any suggestions to help improve my code efficiency?

I am not satisfied with using array data structure. Can any other data structure be used instead of an array?

bool read_file(char *p_file_name,char *file_content[FILE_NO_OF_ROWS][FILE_NO_OF_COL],const char *delimiter, int& token_count )
{

   using namespace std;
   ifstream file_is_obj;
   file_is_obj.open(p_file_name,ios::in);

   string buffer,token;
   int token_pos;
   token_count=0;

   if( file_is_obj.is_open())
   {
      while( !file_is_obj.eof() )
      {
         if(token_count >= FILE_NO_OF_ROWS)
         {
            break;
         }


         buffer.clear();

         token_pos=0;

         std::getline(file_is_obj,buffer);
         for( int iter=0;  token_pos  != -1 || iter >= FILE_NO_OF_COL ; iter++)
         {
            token.clear();

            token_pos=buffer.find(delimiter,0);
            token=buffer.substr(0,token_pos);
            if( !token.empty() )
            {
               buffer.erase(0,token_pos+strlen(delimiter));
               file_content[token_count][iter] = new char[token.length()+1] ;

               memset(file_content[token_count][iter],'\0',(token.length()+1)*sizeof(char));
               token.copy( file_content[token_count][iter], token.length(), 0);
            }
         }
         token_count++;
      }
      token_count-=1; // Token incremented for reading end of file
   }
   else
   {
      printf("\nFile not found (or) Error in opening the file");
      return false;
   }

   return true;
}

Since your delimiter is a char have you tried using the delimiter overload for getline, std::getline(file_is_obj,buffer,delimiter);, to return the string up to but not including the delimiter? — tinstaafl, yesterday
@tinstaafl: that would work to grab individual fields, but I think the question was about preserving some sort of records-and-fields array in which records are one-per-line and fields within the record are delimited by a particular character. — Edward, yesterday
Which if the data is in a consistent format still reduces the code considerably — tinstaafl, yesterday

Edward · Answer 1 · 2014-04-21 21:41:24Z

Some comments:

Use std::vector instead of an inflexible array. You're writing in C++, so you really ought to use the power of that language to write better code.
Don't use memset. It's generally wise to avoid mixing old-style C library functions such as memset, malloc and so forth, with C++. Strange things can happen and there's usually a better C++ way to do things without those.
Instead of using string.find, consider using other techniques such as the ones described here to split a line into its constituent pieces.
Rather than printing an error message, throw an exception. That way your code can be reused in situtations in which std::cout is not present or not visible.
Do NOT embed using namespace std; into code like that! It's extremely bad practice

EDIT: Here's a quick alternative using a variation of the string split I pointed to in point #3.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>

template <typename Container>
Container& split(Container& result,
  const typename Container::value_type& s,
  const typename Container::value_type& delimiters)
{
  result.clear();
  size_t current;
  size_t next = -1;
  do {
    current = next + 1;
    next = s.find_first_of( delimiters, current );
    result.push_back( s.substr( current, next - current ) );
  } while (next != Container::value_type::npos);
  return result;
}

int main(int argc, char *argv[])
{
    if (argc < 2) {
        std::cout << "Usage readarray filename\n";
        return 0;
    }
    const std::string delimiter{","};
    std::vector<std::vector<std::string> > v;
    // fetch the file into v
    std::string line;
    std::vector<std::string> fields;
    for( std::ifstream in(argv[1]); std::getline(in, line); ) {
        v.push_back(split(fields, line, delimiter));
    }
    // now print the results
    for (const auto &i : v) {
        for (const auto &j: i)
            std::cout << "[" << j << "]";
        std::cout << '\n';
    }
}

This uses C++11, but could easily be adapted for old compilers. Based on this input file:

alpha,frank,ruby,diamond
beta,joseph,emerald,circle
gamma,may,onyx,triangle

The code produces this output:

[alpha][frank][ruby][diamond]
[beta][joseph][emerald][circle]
[gamma][may][onyx][triangle]

1) When printing, use const auto. 2) Returning 0 means that the execution went fine. Instead, any other value > 0 means unsuccessful execution. — no1, yesterday
@no1: actually they should probably be const auto & to avoid pointless copies. I'll change my answer. As for your second point, a return value of 0 in the case that no files were passed indicates that no files were successfully processed, and that no errors occurred. Compare with gcc --help or ld -v which also both return 0. — Edward, yesterday
@Edward is there a reason for using good() instead of an implicit bool conversion? I know the semantics are a tiny bit different, but I'm fairly certain there would not be a difference in functionality here. Just curious if there's something I'm missing :). — Corbin, yesterday
@Corbin: The only reason is the tiny bit of semantic difference you note: good() is false if eofbit is set, but the bool conversion still returns true if eofbit is set but not badbit or failbit. — Edward, 23 hours ago
@Edward Right, but in this situation, I don't think that will happen. std::getline fails on EOF, so I think the only thing that would happen is an extra getline call that immediately fails. It seems the brevity of for (std::ifstream in{argv[1]}; std::getline(in, line); ) { v.push_back(...); } would be worth the potential extra call. Or would the behavior different? I just rarely see anyone stray for the idiomatic for pattern, so I suspect there must be something I'm missing. — Corbin, 23 hours ago

Loki Astari · Answer 2 · 2014-04-21 19:47:11Z

Lets start with the function definition:

bool read_file(char *p_file_name,char *file_content[FILE_NO_OF_ROWS][FILE_NO_OF_COL],const char *delimiter, int& token_count )

You are passing C-Strings and arrays of C-String. That's not a good start. Who owns the pointers? Am I supposed to free the pointers after use create them? I can't tell based on this interface.

About the only thing I can tell for sure is that token_count is an out parameter but I am not even sure if I should reset that to zero before starting (which means the user of the code does not know if they need to set it to zero before starting, which means the only safe thing to do is set it to zero (which means if you set it to zero inside the function you are wasting instructions))?

Also passing arrays through a function is dodgy. Its not doing quite what you expect. In this case you are passing char *(*)[FILE_NO_OF_COL] through as the parameter. Which makes validation on the other side a pain. When passing arrays don't give the function a false sense of security be exact with your type (or learn how to pass an array by reference).

Also I am betting FILE_NO_OF_ROWS and FILE_NO_OF_COL are macros. Don't do that macros are not confined by scope. Prefer to use static const objects. Note if you use a vector all that information is stored for you.

Prefer to pass C++ objects. `std::string is always good for passing strings and why not a vector for a container of objects.

Unlike normal; I will not complain about this (as it is confined to a very strict scope). But you should be aware that its frowned upon.

   using namespace std;

Why take two lines to open a file?

   ifstream file_is_obj;
   file_is_obj.open(p_file_name,ios::in);

   // I would have just done
   std::ifstream     file_is_obj(p_file_name);

Declare variables as close to the point of usage as you need them. Declaring them out here seems like a waste. Also it makes the code more complex as you have to reset things manually rather than let the compiler do it.

   string buffer,token;
   int token_pos;
   token_count=0;

Sure you can test if it is open. But usually that is a waste of time. If it failed to open nothing else is going to work. So exit the function now and reduce them number of levels of indention of the code.

   if( file_is_obj.is_open())
   {

This is RARELY correct in any language.

      while( !file_is_obj.eof() )
      {
          // STUFF
      }

The pattern for reading from a file is:

      while( <READ Value> )
      {
           // Read Succeeded continue to processes code.
           // STUFF
      }

This is because EOF is not set until you read past the end of the file. The last successful read will read up-to but not past the end of file. Thus you have no data left to read but the EOF flag is not set and thus you enter the loop. When you attempt to read the next item it will fail and set the EOF flag. So if you are doing it your way your code should look like this:

      while( !file_is_obj.eof() )  // This is now redundant.
      {
          <READ VALUE>
          if (file_is_obj.eof())   // You have to check for read failure 
          {    break;              // In all languages.
          }
          // STUFF
      }

Thus it is better to use the standard pattern were you do the read as part of the test.

If you declare buffer here. Then you would not need to manually clear it here (or reset token).

         buffer.clear();    
         token_pos=0;

Looking at your string splitting code:

         for( int iter=0;  token_pos  != -1 || iter >= FILE_NO_OF_COL ; iter++)

You are making assumptions about the result of std::string::find(). Why do you think token_pos != -1 will ever be correct. The documentation is very clear. If there is no match to the find it returns std::string::npos. Dont make any assumptions on what that value is.

The standard way of testing for out of bounds is less than iter < FILE_NO_OF_COL though your test is perfectly valid iter >= FILE_NO_OF_COL it looks weird.

Prefer pre-increment to post increment. It makes no difference in this case. But in general siuation it can make a difference (because we use the same kind of loop with iterators). So to be consistent and always get the best loop characteristics use the pre-increment.

Would not need to clear token if it was local.

            token.clear();

This bit of code is real over-complex.

            token_pos=buffer.find(delimiter,0);
            token=buffer.substr(0,token_pos);
            if( !token.empty() )
            {
               buffer.erase(0,token_pos+strlen(delimiter));
               file_content[token_count][iter] = new char[token.length()+1] ;

               memset(file_content[token_count][iter],'\0',(token.length()+1)*sizeof(char));
               token.copy( file_content[token_count][iter], token.length(), 0);
            }
         }

You find the delimiter.
You copy the token into another object.
Which would clear it (so the above clear is usless).
You manually allocate memory
You manually clear all the memory.
You do a second copy.
Note this second copy is wrong (as you don't copy the terminating '\0' to make it a real C-String and thus you loose all information about its end.

So you are copying the string twice. You are manually allocating memory (with no definition of how the memory should be released).

Also any unset members of the array have there original values (which may be misleading). Either the caller of the function has to make sure that the array is correctly nulled out before calling (which you do not make clear in your interface) or you should be nulling out the undefined cells.

asked	yesterday
viewed	147 times
active	today

current community

your communities

more stack exchange communities

Reading a file and storing the contents in an array

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged c++ optimization performance file or ask your own question.

Community Bulletin

Hot Network Questions

current community

your communities

more stack exchange communities

Reading a file and storing the contents in an array

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c++ optimization performance file or ask your own question.

Community Bulletin

Related

Hot Network Questions