What Characters Need To Be Escaped In Csv Imports (And How To Escape)?

My database contains text with normal punctuation and I am wondering which characters need to be escaped and how to do it. I have found that “#” in Alt text will break the things when importing Images CSV files. Simple enough to replace and eliminate it in this case; I either avoid it or substitute "No. " for “#”. It would be nice to know how to escape it. CS-Cart uses JSON, which itself uses the “#”. There may be other systems which need to be escaped at the same time. There are other characters which are obviously reserved characters. Semicolons and colons are used, as can be verified by looking at features imports, as well as various brackets. I don't know if there are others.



My description fields might possibly contain any punctuation such as single or double quotes, exclamation points, question marks, colons and semicolons, possibly others. Using an HTML entity involves the ampersand and semicolon, which I don't know the effects of on CS-Cart.



Can anyone help with a short list of characters which need to be escaped and in what fields?

If you use TAB you should be ok, if you export from Excel or Free Office make sure you select edit filters and choose TAB and “encase all text fields”

The # symbol is special to cs-cart, not to the CSV. If possible, I would use the html equivalent instead (but not sure how you're using it).



Generally whatever software you are using to create your CSV should do the correct quoting for characters in the input stream so they they are interpreted as data and not as format specifiers (I.e. quoting columns in double quotes to escape commas or semi-colons in the data and/or using double double-quotes for escaping double-quotes.



CSC cart uses several characters as delimiters within the data. Depending on which import you need to use, might depend on which characters you need to handle. Again, if these are in description fields, I would do an htmlspecialchars() on the input stream before using it in the description fields.



You will have to check with cs-cart on how to handle special chars like #, @, {, }, (, ), etc. that are used internally.



You should avoid using these if possible. I.e. don't use an image alt value that contains '#' character since that character is used to separate the image name from the alt-name.



Sorry for long winded response, but there are several factors to consider on what should be escaped and how.

[quote name='kickoff3pm' timestamp='1392322773' post='177398']

If you use TAB you should be ok, if you export from Excel or Free Office make sure you select edit filters and choose TAB and “encase all text fields”

[/quote]



Thanks for response. I'm exporting from Filemaker Pro rather than a spreadsheet, but I do use TAB delim. Does “encase all text fields” just put the contents between double quotes? I could do that with a Filemaker calculation.

[quote name='tbirnseth' timestamp='1392325742' post='177404']

The # symbol is special to cs-cart, not to the CSV. If possible, I would use the html equivalent instead (but not sure how you're using it).



Generally whatever software you are using to create your CSV should do the correct quoting for characters in the input stream so they they are interpreted as data and not as format specifiers (I.e. quoting columns in double quotes to escape commas or semi-colons in the data and/or using double double-quotes for escaping double-quotes.



[/quote]



By HTML equivalent do you mean and HTML entity? Format for entity is "&char code; and I was worried about ampersand and semicolons.



I'm using Filemaker Pro and exporting my CSV files as TAB delim. Looking at the files in TextWrangler, I saw that Filemaker only puts the TABs in a TAB delim CSV, nothing additional. So I double quote the entire field before export (I do this for all fields as they are all text in a CSV). I wasn't sure if this escaped all punctuation but it seemed to work. I had doubt's about double-quotes in the field content. You said double double-quotes will escape double-quotes. If I have something like “Moby Dick” in the field, then before exporting my CSV I should replace it with ““Moby Dick”” even if I have double-quoted the entire field?


[quote name='tbirnseth' timestamp='1392325742' post='177404']

CSC cart uses several characters as delimiters within the data. Depending on which import you need to use, might depend on which characters you need to handle. Again, if these are in description fields, I would do an htmlspecialchars() on the input stream before using it in the description fields.



You will have to check with cs-cart on how to handle special chars like #, @, {, }, (, ), etc. that are used internally.



You should avoid using these if possible. I.e. don't use an image alt value that contains '#' character since that character is used to separate the image name from the alt-name.



Sorry for long winded response, but there are several factors to consider on what should be escaped and how.

[/quote]



I have decided to avoid these characters. Thanks for comments. This is a tricky area.

[quote]If I have something like “Moby Dick” in the field, then before exporting my CSV I should replace it with ““Moby Dick”” even if I have double-quoted the entire field?[/quote]

Only if you have double quoted the whole column, otherwise I believe they are literal



Generally, it's not that tricky. Just don't put a # in an image alt name and quote all columns and you should be fine. If you find a description being cut-off, then go find the culprit and deal with the exception.

For a basic product I only use the following;



“Product code” “Category” “Price” “Weight” “Quantity” “Detailed image” “Product name” “Description” “Status”

quote all columns and do not use a '#' in the “Detailed image” column and you should be fine.

[quote name='tbirnseth' timestamp='1392362514' post='177447']

Only if you have double quoted the whole column, otherwise I believe they are literal



Generally, it's not that tricky. Just don't put a # in an image alt name and quote all columns and you should be fine. If you find a description being cut-off, then go find the culprit and deal with the exception.

[/quote]



Thanks for reply. Will do as suggested and hopefully things will go smoothly.

[quote name='kickoff3pm' timestamp='1392370572' post='177462']

For a basic product I only use the following;



“Product code” “Category” “Price” “Weight” “Quantity” “Detailed image” “Product name” “Description” “Status”

[/quote]



I'm using lots more fields- 27 in all. Most of your's except Detailed image (doing a separate Images import) and weight. Would like to cut it back for simplicity's sake but not sure of default values for feature comparison, zero price actions, etc.



As long as headers are in the file will any field order be OK?

Order of columns does not matter.

[quote name='tbirnseth' timestamp='1392415495' post='177507']

quote all columns and do not use a '#' in the “Detailed image” column and you should be fine.

[/quote]



I followed suggestions and the import went fine. Thanks to all who contributed to this thread.

bit of an old thread but what are the default values if you dont import them (like merchantman asked above).



ie what happens if I don't set language to en(glish) (will it come up in russian?), put in quantity (will it set it to 0,1 or infinity ?), min/max qty etc

If the column is not present it will not be updated. If it's a new entry it will be whatever is the default in the database.

Got it.

Thanks for your reply.