CSV delimiter and JOIN
Convert Complex CSV with custom delimiter.
jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.
In a previous post I covered the task of converting JSON to CSV.
- Reference: Convert CSV to JSON.
If you have a situation where the data contains commas, then the following post might be of some use.
I will assume you are starting from a Spreadsheet (I am using Google Sheets JOIN
, but if you are using Excel use the TextJoin
command).
Manipulating Data
Lets look at example spreadsheet data:
- I have data similar to that shown below:
A | B | C | |
---|---|---|---|
1 | tag | question | answer |
2 | 1 | You are a manager, and you want to build a new application. | Visit YouTube |
In the above the first row is a single header i.e. tag, question, answer The second row is the start of iterable dataset.
Now suppose you wish to export the above as a CSV?
As the question column already includes a comma in its text, the output will not be representative of the desired output. Remember, a CSV expects a comma to be used a delimiter, so by including an additional comma, you are breaking the agreed parsing protocol.
Clearly, the more text you include, the more of an issue this becomes. You could pre-process the information to remove the offending comma. However there is an easier way to handle this situation using the functionality of the spreadsheet.
Enter the join
Instead of exporting the data as is, create a new tab.
In our example, I create a second sheet and name it Sheet2
.
The new tab will use a join
command to indicate an appropriate character delimiter and the text to be joined.
Typically I use the *
for this purpose, but you can substitute whatever character provides a unique delimiter for your use case.
In Google Sheets
the command looks similar to below:
1=JOIN("*", sheet1!$A$1:$C$2)
The command instructs Google Sheets to use the delimiter *
.
For the defined array in Sheet1 using the block A1:C2, join each cell with the delimiter.
Use this command to generate new rows per line in Sheet1
.
The output is now a single column in the sheet!
A | B | C | |
---|---|---|---|
1 | tag*question*answer | ||
2 | 1*You are a manager, and you want to build a new application.*Visit YouTube |
Rendering a single column is important as we dont want multiple delimiter featured in the output
Now from the tab containing the join
command download the information as a CSV.
The content downloaded will feature the custom delimiter rather than the standard comma.
1tag*question*answer
21*You are a manager, and you want to build a new application.*Visit YouTube
You can now process the CSV to JSON as per normal - just remember to specify that the delimiter used is *
(asterisk).