Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returns empty citation with some formats when citation contains certain cyrillic characters #49

Closed
3 tasks done
NateWr opened this issue Apr 2, 2018 · 7 comments
Closed
3 tasks done
Assignees
Labels

Comments

@NateWr
Copy link

NateWr commented Apr 2, 2018

Please follow the general troubleshooting steps first:

  • I read the README and followed the instructions.
  • I am sure that the used CSL metadata follows the CSL schema.
  • I use a valid CSL stylesheet

Bug reports:

We're getting an empty citation returned when passing a citation for a journal article titled Est maximus eu donec congue “Nešto Između” Srđana Karanović“a. It returns:

<div class="csl-bib-body">
  <div class="csl-entry">.</div>
</div>

Used CSL stylesheet:

Can reproduce this with chicago-author-date.csl and turabian-fullnote-bibliography.csl.

Used CSL metadata

We pull the citation data from PHP, so here's the stdClass that I pass to CiteProc::render:

stdClass Object
(
    [type] => article-journal
    [id] => 30
    [title] => Est maximus eu donec congue \xe2\x80\x9cNe\xc5\xa1to Izme\xc4\x91u\xe2\x80\x9d Sr\xc4\x91ana Karanovi\xc4\x87\xe2\x80\x9ca
    [container-title] => Journal of Public Knowledge
    [container-title-short] => publicknowledge
    [volume] => 2
    [issue] => 3
    [section] => Articles
    [URL] => http://localhost/ojs/publicknowledge/article/view/30
    [accessed] => stdClass Object
        (
            [raw] => 2018-04-02
        )

    [author] => Array
        (
            [0] => stdClass Object
                (
                    [family] => Corino
                    [given] => Carlo
                )

            [1] => stdClass Object
                (
                    [family] => Contributor
                    [given] => Test
                )

            [2] => stdClass Object
                (
                    [family] => Another
                    [given] => Test
                )

        )

    [issued] => stdClass Object
        (
            [raw] => 2017-10-17 00:00:00
        )

    [DOI] => 10.1234/publicknowledge.v2i3.30
)
@seboettg
Copy link
Owner

seboettg commented Apr 6, 2018

Hey Nate, I'm currently busy. I will lookup for a fix soon.

@NateWr
Copy link
Author

NateWr commented Apr 9, 2018

No problem, I'm the same. Thanks for looking into it. 👍

@seboettg
Copy link
Owner

I'm not able to encode the Array dump into JSON. Please call json_encode and pass that stdClass object and post the result here. Thank you!

@seboettg seboettg self-assigned this Apr 10, 2018
@NateWr
Copy link
Author

NateWr commented Apr 11, 2018

Sure, here it is in JSON. It looks like maybe the characters in the title have been transformed a bit, but it may also be that my test data has changed a bit (still get the same errors though):

{
	"type": "article-journal",
	"id": "30",
	"title": "Est maximus eu donec congue \\u201cNe\\u0161to Izme\\u0111u\\u201d Sr\\u0111ana Karanovi\\u0107\\u201ca",
	"container-title": "Journal of Public Knowledge",
	"container-title-short": "publicknowledge",
	"volume": "2",
	"issue": "3",
	"section": "Articles",
	"URL": "http:\\/\\/localhost\\/ojs\\/publicknowledge\\/article\\/view\\/30",
	"accessed": {
		"raw": "2018-04-11"
	},
	"author": [{
		"family": "Corino",
		"given": "Carlo"
	}, {
		"family": "Contributor",
		"given": "Test"
	}, {
		"family": "Another",
		"given": "Test"
	}],
	"issued": {
		"raw": "2017-10-17 00:00:00"
	},
	"DOI": "10.1234\\/publicknowledge.v2i3.30"
}

@seboettg seboettg added the bug label Apr 14, 2018
seboettg pushed a commit that referenced this issue Apr 14, 2018
now checks if the first character of a word is a valid utf-8 letter that can be uppercased, before running ``mb_strtoupper``.
@seboettg
Copy link
Owner

Okay, I've determine the problem.

The stylesheet specifies that the title should be formatted using text-case="title":

<text variable="title" quotes="true" text-case="title"/>

For uppercase strings, the first character of each word remains capitalized. All other letters are lowercased.
For lower or mixed case strings, the first character of each lowercase word is capitalized. The case of words in mixed or uppercase stays the same.
(have look in the csl specification)

To do this I split the title into single words by spaces and hyphen. The mb_strtoupper method cannot handle a word like “Nešto because is not a valid letter which can be uppercased. So the complete string was destroyed. Now citeproc-php checks first if the first character of a word is a valid utf-8 letter that can be uppercased, before running mb_strtoupper.

Please test if the issue is solved by using the branch issue-49.

If everything fine, i will make a bugfix release soon.

@NateWr
Copy link
Author

NateWr commented Apr 16, 2018

Fix confirmed. Thanks for another quick fix! I'll look forward to the next release. 👍

@seboettg
Copy link
Owner

Okay! 2.1.2 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants