Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF8 BOM to csv export files #8637

Closed
LeeDr opened this issue Oct 12, 2016 · 5 comments · Fixed by #8662
Closed

Add UTF8 BOM to csv export files #8637

LeeDr opened this issue Oct 12, 2016 · 5 comments · Fixed by #8662
Assignees
Labels
bug Fixes for quality problems that affect the customer experience PR sent

Comments

@LeeDr
Copy link
Contributor

LeeDr commented Oct 12, 2016

Kibana version: 5.0

Elasticsearch version: 5.0

Server OS version:

Browser version:

Browser OS version:

Original install method (e.g. download page, yum, from source, etc.):

Description of the problem including expected versus actual behavior:
There have been some github issues (#8598)
and discuss threads (https://discuss.elastic.co/t/resolved-korean-text-is-broken-when-open-the-exported-csv-with-excel/53436/4)
about Excel not opening the CSV export files and showing the UTF8 characters properly. There's a work-around but it seems like we should be writing that byte order marker into the csv we create. If we do that then Excel does open files properly in my testing.

The work-around is;

Instead of just clicking the csv to open with Excel,
open Excel and use Data > Get External Data > From File and check that the
File origin is 65001: Unicode (UTF-8)

Kibana does seem to export UTF8 characters correctly into the CSV but doesn't set the BOM in the file. Here's how you can see that;

Steps to reproduce:

  1. Go to any Kibana visualization

  2. click the little arrow to collapse the visualization and see the Export links.

  3. I exported Raw, but I'm pretty sure the only difference between Raw and Formatted is how dates are exported.

  4. in my case (CPU Usage visualization from metricbeat) my data was this

    $ cat '/c/Users/Lee/Downloads/CPU Usage.csv'
    idle,sys,user
    "0.9934333333333334","0.0018000000000000006","0.0015555555555555557"
    
  5. and you can see there's no BOM like this;

    $ head -c 3 '/c/Users/Lee/Downloads/CPU Usage.csv' | hexdump -C
    00000000  69 64 6c                                          |idl|
    00000003
    
  6. Now this data only contains 7-bit lower ASCII chars so it opens in Excel just fine. So I modified it by pasting some Kanji chars into it using Notepad++ so now I have this;

    idle,sys,漢字user
    "0.9934333333333334","0.0018000000000000006","0.0015555555555555557"
    
  7. Just saving with Notepad++ doesn't add the byte order marker. And when I open it with Excel I see the problem reported in Discuss and Github issues;
    2016-10-12 09_18_48-cpu usage csv - excel

  8. But Notepad++ has a menu item Encoding > Convert to UTF-8-BOM and when I use that and save it as a new file and check the first 3 bytes I see the BOM;

    $ head -c 3 '/c/Users/Lee/Downloads/CPUUsageBOM.csv' | hexdump -C
    00000000  ef bb bf                                          |...|
    00000003
    
  9. And when I open that with Excel (2016 on Windows 10) it appears correctly;
    2016-10-12 09_25_37-cpuusagebom csv - excel

So I conclude that we should be writing that BOM in the CSV Export output file. It's proper i18n handling of data.

@LeeDr LeeDr added bug Fixes for quality problems that affect the customer experience P2 labels Oct 12, 2016
@LeeDr
Copy link
Contributor Author

LeeDr commented Oct 12, 2016

cc @CharlesLdy

@LeeDr
Copy link
Contributor Author

LeeDr commented Oct 12, 2016

cc @Bargs

@CharlesLdy
Copy link

I think this is a better way to avoid the messy code. The question is that if I can only use it in Kibana version 5.0 after the feature be added? Now my production version is Kibana 4.5.4.

@kobelb kobelb self-assigned this Oct 13, 2016
@LeeDr
Copy link
Contributor Author

LeeDr commented Oct 13, 2016

Since this would be a bug fix and not a new feature, it could be back-ported to the 4.6 branch and come out in the next release of that branch.

@kobelb
Copy link
Contributor

kobelb commented Oct 13, 2016

It looks like adding the UTF-8 BOM doesn't fix all versions of Excel, specifically Excel 2011 for Mac. However, it's fixing it for Microsoft Excel 2016 on both Mac and Windows, and it's the technically correct way to format the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience PR sent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants