New user self-registration is disabled due to spam. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Before reporting a bug, please make sure that your Eigen version is up-to-date!
Bug 209 - support for CSV format I/O
Summary: support for CSV format I/O
Status: NEW
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - general (show other bugs)
Version: 3.0
Hardware: All All
: --- enhancement
Assignee: Nobody
URL:
Whiteboard:
Keywords:
Depends on: 622
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-02 15:15 UTC by Gael Guennebaud
Modified: 2014-03-20 18:54 UTC (History)
4 users (show)



Attachments
stream operator>> for DenseBase (2.98 KB, text/x-chdr)
2014-03-20 13:52 UTC, Kai M.
no flags Details
Dynamic input for vectors (1.19 KB, text/plain)
2014-03-20 14:44 UTC, Christoph Hertzberg
no flags Details

Description Gael Guennebaud 2011-03-02 15:15:40 UTC
a good suggestion from mattd (http://forum.kde.org/viewtopic.php?f=74&t=93733&p=190551#p190474):

""
Perhaps CSV support would be possible in the future?

It's standardized (RFC 4180) and seems to be _the_ lowest-common-denominator data-exchange format supported by a very wide range of software, from spreadsheets (e.g. Gnumeric, Microsoft Excel, Open Office), through scientific computing software (e.g. Mathematica, MATLAB, GNU Octave, Scilab), to a variety of statistical packages (E-Views, Ox, OxMetrics, R, SAS, STATA, ...).

Ideally, overloading the operator<< so that in addition to current functionality it accepts a csv-file as a right-operand would fit with the way Eigen does things -- and "comma-initialization" description would still fit perfectly!
""

though I'm not convinced by using operator<< for that but why not.
Comment 1 Christoph Hertzberg 2014-03-04 15:33:51 UTC
I think it would make more sense to overload operator>>(std::istream&, Eigen::DenseBase<Derived>&), thus being compatible to standard C++ I/O. Of course istream can be an ifstream.
A question is, if CSV streams are always accepted, or maybe only for something like:
  MatrixXd A;
  stream >> A.format(...); // with some fitting format description.
Comment 2 Matt 2014-03-04 17:02:22 UTC
Hi!

I agree with the idea of following the Standard Library.

My thinking is that it would be good to support both std::istream (including std::ifstream) and std::ostream (including std::ofstream), with the respective use of operator>> for input (input_stream >> eigen_object) -- and operator<< for output (output_stream << eigen_object).
Comment 3 Christoph Hertzberg 2014-03-04 17:23:04 UTC
CSV output is already possible, using something like this:
  IOFormat CSVFmt(FullPrecision, 0, ", ");
// or
  IOFormat CSVFmt(FullPrecision, DontAlignCols, ",\t");
// and then
  std::cout << A.format(CSVFmt);
Comment 4 Kai M. 2014-03-20 13:52:43 UTC
Created attachment 434 [details]
stream operator>> for DenseBase

I overloaded the std::istream &operator>>(std::istream &s, DenseBase<Derived> &m). Source is attached as eigen.h file.
The implementation makes it possible to read (probably) any format possible by IOFormat class.

What does work:
 - formats mentioned in http://eigen.tuxfamily.org/dox/structEigen_1_1IOFormat.html
 - CSV Format with IOFormat(FullPrecision, 0, ",") // no blank in ","!!!
   I tested it with LibreOffice and it turned out to be quite complicated because libreoffice does not honor the file format of the source file.

What does not work (yet):
 - Matrix needs either fixed dimensions or a known size. Eigen::Dynamic is not possible because I don't know how to resize DenseBase class. Also runtime detection of 'data size' in stream is realy ugly and not possible for every IOFormat. (In the end I removed this code.)
 - No runtime selection of IOFormat. The format has to be selected by defining EIGEN_DEFAULT_IO_FORMAT.

Hope this helps,
Kai
Comment 5 Christoph Hertzberg 2014-03-20 14:20:37 UTC
(In reply to comment #4)
> Created attachment 434 [details]
> stream operator>> for DenseBase

Nice start. Perhaps we can finalize I/O for 3.3

> What does not work (yet):
>  - Matrix needs either fixed dimensions or a known size.

There is a resize(rows, cols) method defined in DenseBase with a meaningful specialization for Matrix<...> and Array<...>.
But I agree that this is not easy to integrate, on the one hand because not every format allows automatic detection where (the row of) a matrix ends, on the other hand, resizing is expensive, because all data needs to be copied to the resized matrix for each resize operation.
A feasible solution would involve some temporary data storage, e.g. in a std::vector<Scalar>.

>  - No runtime selection of IOFormat. The format has to be selected by defining
> EIGEN_DEFAULT_IO_FORMAT.

I would suggest a syntax equivalent to the withFormat() output, e.g.

  Matrix4d A;
  stream >> A.withFormat(IOFormat(...));

Furthermore, it would be nice to allow more liberal input formats, i.e. optionally ignore all whitespace or have formats which allow all of
  1 2;3 4
  [[1 2][3 4]]
  [1,2;3,4]
  [[1,2][3 4]]
  etc
But that comes close to writing a full parser, especially if accepted formats shall be definable at runtime.

We also need to decide what to do if the format does not fit the input or the input does not suffice to fill the matrix. I don't think assertions are a good way whenever user input is involved.
Basically, I see two alternatives:
1. Set the stream status to bad -- that's what C++ streams generally do for bad input. I don't really like this, because it needs manual checking after each input and is likely to be forgotten, leading to subtle bugs.
2. Throw an exception. I would generally prefer this if things can go wrong depending on actual input. However, this does not work if compiled with exceptions disabled.
Comment 6 Christoph Hertzberg 2014-03-20 14:44:12 UTC
Created attachment 435 [details]
Dynamic input for vectors

For reference, this is what I once wrote for inputting dynamic sized vectors. It lacks many abilities such as custom braces/separators and it does not work for matrices.
Comment 7 Matt 2014-03-20 18:54:16 UTC
Hi!

If I may tune in :-)

(a) Input formats.

This sounds like a good candidate for a nice-to-have feature, but I think it may also be important to prioritize. Perhaps the first priority should be to finish the RFC 4180 CSV support first (since that's the most widely supported general lowest-common-denominator input/output format) -- and only after that's done consider more advanced / customized I/O formats. Thoughts?

(b) Input format errors.
I've found the following informative:
http://gehrcke.de/2011/06/reading-files-in-c-using-ifstream-dealing-correctly-with-badbit-failbit-eofbit-and-perror/

Ideally, it would be nice to achieve a design such that by sticking to idiomatic C++ the users shouldn't even normally be able to express "a source code that's forgotten to check the error state" in their programs. In general, I think it's a good idea to follow the already established (and thus well-known) existing conventions in the C++ Standard Library (due to the Principle Of Least Astonishment, essentially).

Perhaps it would be also reasonable to consider behaving like boost::lexical_cast on encountering a non-numeric value with the user-requested to-numeric conversion -- i.e., throwing:
http://www.boost.org/doc/libs/release/doc/html/boost_lexical_cast/synopsis.html#boost_lexical_cast.synopsis.bad_lexical_cast

That being said, an interesting alternative may be that of Boost.Math -- let the user choose the error handling policy at compile-time.

See:
http://boost.org/libs/math/doc/html/math_toolkit/error_handling.html
http://boost.org/libs/math/doc/html/math_toolkit/stat_tut/weg/error_eg.html

Note how the users are allowed to choose from among the following error handling policies:
"The available actions are:
- throw_on_error: Throws the exception most appropriate to the error condition. 
- errno_on_error: Sets ::errno to an appropriate value, and then returns the most appropriate result 
- ignore_error: Ignores the error and simply the returns the most appropriate result. 
- user_error: Calls a user-supplied error handler."

Different application areas and different use cases may imply that different error handling policies are optimal -- perhaps it's very legitimate to open it up as a customization area?

In general, this offers some good advice:
http://www.boost.org/community/error_handling.html

Note You need to log in before you can comment on or make changes to this bug.