Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems rCharts doesn't work well with Chinese #549

Open
FingerLiu opened this issue Nov 14, 2014 · 18 comments · May be fixed by #552
Open

It seems rCharts doesn't work well with Chinese #549

FingerLiu opened this issue Nov 14, 2014 · 18 comments · May be fixed by #552

Comments

@FingerLiu
Copy link

It seems rCharts doesn't work well with Chinese

Background:

I've a csv cotains chinese characters,which is encoded in gb2313(system default) a sample of my csv is

date,title,name,id,message

"2014-10-07 8:42:37","元老",879231132,879231132,"加 "

"2014-10-07 8:43:50","元老",879231132,879231132,"这么多空格,不加引号。怎么行。 "

"2014-10-07 8:45:10","新人",451635342,451635342,"想问一下,如果有一些专业词汇不懂 找谁帮忙呀? "

"2014-10-07 8:45:30","大神",532594859,532594859,"发出来,一起研究 "

Problem:

I read them into R using read.csv,and it can be correctly print out in R Console.
But when I try to put the values into a label of hChart,its' shown as gibberish(characters with no meaning)
I'v tried Encoding(title)<- "UTF-8" and enc2utf8(),they don't work ,either.
How can I fix this??Any idea would be great helpful

Other Info:

R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] RJSONIO_1.3-0 httr_0.5 rCharts_0.4.5

loaded via a namespace (and not attached):
[1] grid_3.1.1 lattice_0.20-29 plyr_1.8.1 Rcpp_0.11.3 stringr_0.6.2 tools_3.1.1
[7] whisker_0.3-2 yaml_2.1.13

@ramnathv
Copy link
Owner

This could be a problem with Highcharts, since rCharts merely passes the data on to highcharts. If you can post a minimal reproducible example, it would be useful to help debug.

@FingerLiu
Copy link
Author

Thank u very much for replying.
Here is the example.I put the CSV here.
I am new to R,and the encoding problem really drove me crazy ( > <
any suggestion would be very welcome!!
ftp://115.29.147.107/share/gb2312.csv
And the R code would be like this.(The column title is where gibberish occurs).

library(rCharts)
library(httr)
library(RJSONIO)
paresed_data <- read.csv("gb2312.csv",header = TRUE,sep = ",",quote="\"")

data <- subset(paresed_data,select = c(title,name,id))
tmp_data <- data.frame(table(data$id),stringsAsFactors = FALSE)
descs <- data[!duplicated(data$id),]
descs <- subset(descs,select = c(title,name))
msg_cnt <- tmp_data$Freq
id <- tmp_data$Var1
df <- data.frame(id,msg_cnt,descs, stringsAsFactors = FALSE)
df <- head(x = df,n = 20)
colnames(df) <- c("id","msgCnt","title","name")

h2 <- hPlot(
    x = "id",
    y = "msgCnt",
    data = df,
    type = "scatter",
    title = "top n speakers", 
    group ="title"
)
h2

And my screenshot is like this
screenshot
If the pic is broken, u can get it here ftp://115.29.147.107/share/Rplot.png

@FingerLiu
Copy link
Author

I found iconv helps... with some letters,but there are still some gibberish .
`iconv(x,from = "gb2312",to = "utf-8")

@ramnathv
Copy link
Owner

Not sure how much I can help here. So I am copying the exports on encoding, @yihui and @kohske : Do you guys have any thoughts on what might be causing some letters to show up as gibberish?

@yihui
Copy link

yihui commented Nov 16, 2014

I might be able to figure out what went wrong here, but I probably do not have enough time for it at the moment.

@ramnathv
Copy link
Owner

No worries @yihui. Whenever you find the time, that would be great. Maybe @kohske will be able to help. Thanks in any case.

@saurfang
Copy link

It appears this is only an issue for Windows. Under Mac if you set locale to UTF-8 and specify fileEncoding="gb2312" when reading the data, everything will happily live as Unicode.

However R Windows doesn't appear to support an UTF-8 locale. When rCharts write stuff to a file, it uses your default locale but marks the HTML file using utf-8 charset. If we change the meta tag in HTML to gb2312 (or manually change encoding in browser), it will work fine.
image

Long story short, I think we can fix this by using file(..., encoding = "UTF-8") as the target for writeLines.

I have created a patch and you can give it a try by devtools::install_github("saurfang/rCharts", ref = "utf8-writelines")
Tested under Windows and Mac and so far this seems to do the trick.
@ramnathv @yihui Feel free to code review at saurfang@f8105b5

If this works for you, let us know and I'll submit a pull request for this to merged.

@kohske
Copy link
Contributor

kohske commented Nov 17, 2014

Yes, @saurfang is on the correct way. Actually we've done it on slidify here: ramnathv/slidify@2434289

Here is a quick and temporal workaround:

path <- tempfile(fileext = ".html")
writeLines(h2$render(), file(path, "w", encoding="utf8"))
browseURL(path)

@ramnathv
Copy link
Owner

Thanks @saurfang and @kohske. @saurfang you can go ahead and send me a PR for this to be merged.

@saurfang saurfang linked a pull request Nov 17, 2014 that will close this issue
@FingerLiu
Copy link
Author

Thank you all very much,I will try it today,if it works or not, I will let you know @saurfang.
I am really getting crazy with encoding problems in R until you help me!
谢谢!

@FingerLiu
Copy link
Author

@saurfang I'm sorry to tell you that after installing your rChart,when I use rChart to draw something(even if the demo on rcharts.io),it can't be seen on the viewer, and R kept warning me like this

1: closing unused connection 5 (C:\Users\xxx\AppData\Local\Temp\Rtmp08kaj0\rCharts16f44297fa9/index.html)

I used the following command to install your package.

remove.packages("rCharts")
devtools::install_github("saurfang/rCharts", ref = "utf8-writelines")

And later I reinstalled the master version of rCharts, it became OK.
I do not know it is a bug, or I did something wrong .Just inform you about it.

@yihui
Copy link

yihui commented Nov 17, 2014

@FingerLiu See my comment at #552 (diff)

@kohske
Copy link
Contributor

kohske commented Nov 17, 2014

We see writeLines in other places, so these should be fixed. Possibly we may have an single function that outputs to a file.

@saurfang
Copy link

Ah yes. My bad. Thanks @yihui
@FingerLiu Do you mind try installing it again? It should work fine this time.

How about a write_file function that wraps writeLines in the same spirit of read_file function in utils.R?

@ramnathv
Copy link
Owner

That sounds like a good idea @FingerLiu. Can you make that change in the PR?

@saurfang
Copy link

Alright changes have been made that we now use write_file instead of writeLines and it uses UTF-8 by default. Hopefully I have caught all those places.
I have tested it by printing plot in RStudio, save plot to a file, show plot in RMarkdown (which to my understanding shouldn't be affected in the first palce), make_chart to a file, and publish it on gist: http://rcharts.io/viewer/?41265ce302193c23cd77#.VGqrMXHF_OM

Do you guys mind giving it a code review and let me know your thoughts? #552

@FingerLiu
Copy link
Author

Hi,@saurfang ,thanks for fixing this. I try again and it works fine with the example above.

But when I read a csv encode in UTF-8,it goes wrong again.
I changed the encoding of the CSV above,and I read the CSV like this

paresed_data <- read.csv("data//utf8.csv",header = TRUE,sep = ",",quote="",stringsAsFactors = FALSE,encoding = "utf-8")

Am I doing sth wrong?
Is that only files encoded in locale can be parsed in R ,and because of the reason you mentioned

However R Windows doesn't appear to support an UTF-8 locale.
I can not parse a utf-8 file in R on Windows?

@kohske
Copy link
Contributor

kohske commented Nov 18, 2014

@FingerLiu try fileEncoding='utf-8' rather than encoding='utf-8'. Not sure if it works well though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants