CSV Import issues - help please…

PatrykKizny · November 20, 2012, 8:21am

Hi,

Usually I try to solve all problems myself, but after days of struggle I can’t really solve this one.
Hope to get some support from you. I am having problems with Krakatoa PRT Loader importing CSV Files.

When trying to open some CSV files I get the following error. It pops up when updating particle count or when trying to render.

This is Max 2012 x64 and problem is visible on all Krakatoa versions (I tested 2.0.1, 2.0.2 and BETA 2.1.5).

Please read carefully my descriptions below.
I am importing a file saved in ANSI/Windows 1252 format (also tried Unicode).

6 Values are used, XYZRGB, tried with headers and without - does not matter.
Files are split into 4MB parts. Already succeeded importing even 150MB files - it is not the matter of file size.
No empty lines.
CRLF end line carets.
It is NOT syntax problem - COPY-PASTE of 1st 100 records and loading them from a test file seems to work and particles DO render.
I also tried using different syntax and delimiters (", “, “,”,” ") etc… - no impact.

Sample data looks as follows:

-1.8862737 5.2339072 37.398249 104 108 93 -2.4525737 4.2608069 33.129249 63 55 62 -1.4356736 5.3109071 35.764948 172 146 109 -1.1466736 5.115307 33.800747 169 149 116 -2.4857737 0.50730695 34.622748 93 92 100 -3.4820737 2.0735069 37.966948 48 46 63 -2.4899737 4.2738072 33.507347 117 116 105 -3.4274737 3.628607 37.560549 99 97 106 -0.98207364 5.2002073 36.011748 107 77 53 -0.30177358 5.2680071 36.000148 187 166 140 -3.1990736 3.016807 34.475349 52 51 69 -2.5097736 4.2439071 33.66825 123 113 100 -3.1788737 1.6515069 32.761149 70 65 79 -2.9224736 4.1410071 37.371051 127 128 121

Taking my tests even further, I have isolated 2 files which seem to identical in terms of how they were created, the size, the syntax, encoding etc…, but in fact one of them does import properly and the other does not. You can download these 2 files here:

http://bit.ly/Uc4e5m

When importing or rendering in Krakatoa PRT Loader this gives me same error.

While opening in Particle Data Explorer I am able to parse first rows etc. When I am trying to load more it also crashes with the same message.
The files come from a split of a larger file (~700MB) and I am also unable to load that file.

I don’t think the file is corrupt since I was able to open it and manipulate with various text editors and pointcloud data processors.
I have tried importing the files in Leica Cyclone and it looks like the problem is the same for that software, but additionally it gives me an information about the problematic lines.
The line containing so called ‘bad’ records is 52229 from the file “Exterior_xab.csv”.

I investigated these lines and it looks also like they are normal. Nothing suspicious. Middle line is 52229

7.9950272 -19.767094 7.9689474 82 60 36 9.9177268 -19.351793 7.1968475 78 54 28 8.5610273 -19.511994 7.6186476 79 65 36

I think I tried everything, so appreciate any help on this issue.
Thanks for help.

paul · November 20, 2012, 5:32pm

The problem in Exterior_xab-broken.csv is on line 62230, which has a single number on a line. Middle line is 62230:

8.4942273 -19.642394 7.2004475 52 50 27 171525 1.0829265 -21.459695 12.676648 117 100 80
(We should add better error messages to our CSV importer to help track down such problems.)

Was the larger file originally a PTS file? PTS files have a particle count before each block of particle data. Newer versions of Krakatoa can read PTS files, but they must have a .pts file extension.

PatrykKizny · November 20, 2012, 9:06pm

Hi Paul,

Many thanks for solving my problem. Probably having access to the source and debugging it is easy to find out. Sounds trivial indeed.
Yes, my data comes from PTS originally and I was aware of the header information showing the count which I have removed, but I did not know about counts in between blocks.

Could you please let me know from which version Krakatoa parses PTS format?
Also, does it parse only proper XYZIRGB .pts files or it handles XYZRGB flavor as well?

Since more and more pointcloud data is being used all around I would really suggest including the following in the upcoming releases:

Precise information on parsing errors
Possibility of simply ignoring lines which can’t be parsed
Adding support for variable file encodings (since post studios usually work on multiple platforms -> OSX/Linux/Windows) and line break standards
Adding a GUI that would allow for import customization.

Regarding #3 - the idea is to give maximum flexibility for importing various text files flavors and applying initial mappings.
It could consist of:

Customizing syntax (selecting delimiter types etc…)
Adding mapping to particular channels

Initially parsed fragment of file would be displayed in columns and each of columns would have variable selectors to assign mappings.
Good examples are MS Excel (at lest there’s something good about it!) or Leica Geosystems Cyclone.

Many thanks for all your help.

JohnnyRandom · November 20, 2012, 9:48pm

Oh good, you found the issue

I was at a bit of a loss.

I am a bit surprised that whatever wrote the data didn’t catch the bad line(s).

Bobo · November 20, 2012, 10:17pm

Hi Paco,

Strangely enough, I was not aware of the ability of recent Krakatoa versions to load PTS and PTX files. Since I am responsible for the documentation, this explains why this is not documented anywhere Your best bet is to download the latest public Beta build 2.1.5 from thinkboxsoftware.com/krakato … ilds-beta/
The PRT Loader does not show the formats as supported, but if you switch to All Files (.) and pick a PTS or PTX, it will be loaded. If the file name does not contain a frame number, also be sure to check the “Load Single Frame Only” option.

That being said, loading large PTS and PTX can be very slow. If you have LIDAR data with millions of samples (I loaded a file with 27 million points and it took around a minute to parse), we ship a PTS/PTX to PRT command line converter utility with our Frost plugin. When selecting a PTS or PTX file in Frost, a dialog pops up that allows you to control the conversion. We intend to add the same workflow to Krakatoa PRT Loader because the direct loading just makes no sense speedwise.

Note that the “Load First N” option does not work for PTS and PTX because we don’t know the total count in advance. So they always load in Every Nth mode, and the whole file is read every time.

Try downloading the latest Frost public Beta build from here: thinkboxsoftware.com/frost-d … -releases/

I completely agree with your 3) point. In fact, we have had that on our ToDo list for a while. Recently, we added support for pretty much any delimiters. We want to allow for custom remapping of data content to data channels.

For text files without a header line, we now also import all channels we find. For example, if you have a CSV file that looks like
0.0 1.0 2.0 42.0 33.0 100.0 2.34
1.0 1.0 2.0 44.0 43.0 200.0 2.35
we assume the first 3 columns to be Position X,Y and Z, and the rest is imported as “Data1”, “Data2”, “Data3” and “Data4” of type float32[1].
This allows you to access these columns in a Magma modifier and use ToVector to build any channels that are supposed to be vectors, e.g. combine Data1, Data2 and Data3 as Color and remap Data4 to Density…

We felt this was a good temporary workaround, but might add a more sophisticated UI for the remapping in the future…

paul · November 22, 2012, 1:08am

Krakatoa Beta 2.1.2 and later can parse PTS files. However, as Bobo noted above, you must change the “Files of type:” to “All(.)” to pick a .pts file.

Yes, it can parse XYZIRGB, XYZRGB, XYZI, and XYZ files. (The “I” part is imported as a float32[1] “Intensity” channel. “XYZ” is imported as “Position”, and “RGB” is imported as “Color”.)

Thank you for your suggestions regarding our importer! I have added your suggestions to our wish list.

PatrykKizny · November 22, 2012, 1:15am

Guys,

Many thanks for heads up.
I already managed to work around this via sed which is an amazing stream processor and a python script to process 4GB files cleaning them up from problematic lines and adding a proper headers.
Worked as a charm.

That is awesome! Did not know about it and managed the other way around.
Already converted all the data to .prt via rendering to file, but since the PRT format was specified I even thought about writing a simple converter. Good to know it already exists.

Just to clarify - regarding text files do you support all encodings and line caret types?
I was missing that in the documentation.

paul · November 22, 2012, 1:18am

Sorry, what do you mean by “encodings” here?

PatrykKizny · November 22, 2012, 1:23am

One more question - slightly off topic now, but I guess others will have sufficient read above.

Assuming that I have a laser scan data already in Krakatoa and want to speed up loading…
You have a decent feature of reading every Nth point or just 1st N points. Reading first N points is obviously faster so thinking about using it. Now questions:

-> Is Krakatoa PRT format is organized that way that it allows to store brief overview of entire space in the beginning of file and then gets into details further in the file?
Many video codecs are done that way letting read every Nth pixel easily for lowres playback.

Even if it does not work that way, I guess we could easily make it possible by simply sorting the pointcloud data that way, that we have the following file structure:

Beg of file
Every 1 000 000th points (spread evenly across space)
Every 100 000th points (evenly)
Every 10 000th points (evenly)
Every 1000th points (evenly)
…

If you supported writing data to file that way via optimisation that would make all workflows much faster (previews)
If there’s no such optimisation done while generating PRT files now, can you advise some methods to prepare this kind of sorting via Magma scripts or any other way?

PatrykKizny · November 22, 2012, 1:24am

UTF8, Unicode 8, unicode 16, Windows 1252=ANSI=Latin1 etc…

Bobo · November 22, 2012, 3:22am

Originally we followed the RFC 4180 spec in our CSV file implementation, so we assumed ASCII character set, CRLF for line end, no leading bytes or magic numbers:
tools.ietf.org/html/rfc4180

Recently, we expanded the support for loading various delimiters, but AFAIK we do not expect and accept Unicode. There is really no good reason for that, since the data is just digits and the optional header’s channel names should be English characters (case-sensitive).

If you have a case of a CSV file that is not pure ASCII, please show us and we might look into supporting it.