Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #[DELIM[ … ]DELIM] syntax for string literals #1379

Merged
merged 4 commits into from
Sep 15, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Changes from 0.13.0
longer names
* Periods are no longer allowed in keywords
* `eval` is now a function instead of a special form
* Added a form of string literal called "bracket strings" delimited by
`#[FOO[` and `]FOO]`, where `FOO` is customizable
* The compiler now automatically promotes values to Hy model objects
as necessary, so you can write ``(eval `(+ 1 ~n))`` instead of
``(eval `(+ 1 ~(HyInteger n)))``
Expand Down
47 changes: 34 additions & 13 deletions docs/language/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,26 +52,47 @@ digits.

(print 10,000,000,000 10_000_000_000)

Unlike Python, Hy provides literal forms for NaN and infinity: `NaN`, `Inf`,
and `-Inf`.
Unlike Python, Hy provides literal forms for NaN and infinity: ``NaN``,
``Inf``, and ``-Inf``.

string literals
---------------

Unlike Python, Hy allows only double-quoted strings (e.g., ``"hello"``). The
single-quote character ``'`` is reserved for preventing the evaluation of a
form (e.g., ``'(+ 1 1)``), as in most Lisps.
Hy allows double-quoted strings (e.g., ``"hello"``), but not single-quoted
strings like Python. The single-quote character ``'`` is reserved for
preventing the evaluation of a form (e.g., ``'(+ 1 1)``), as in most Lisps.

Python's so-called triple-quoted strings (e.g., ``'''hello'''`` and
``"""hello"""``) aren't supported. However, in Hy, unlike Python, any string
literal can contain newlines.

Whether running under Python 2 or Python 3, Hy treats string literals as
sequences of Unicode characters by default, and allows you to prefix a literal
with ``b`` to treat it as a sequence of bytes. So when running under Python 3,
Hy translates ``"foo"`` and ``b"foo"`` to the identical Python code, but when
running under Python 2, ``"foo"`` is translated to ``u"foo"`` and ``b"foo"`` is
translated to ``"foo"``.
literal can contain newlines. Furthermore, Hy supports an alternative form of
string literal called a "bracket string" similar to Lua's long brackets.
Bracket strings have customizable delimiters, like the here-documents of other
languages. A bracket string begins with ``#[FOO[`` and ends with ``]FOO]``,
where ``FOO`` is any string not containing ``[`` or ``]``, including the empty
string. For example::

=> (print #[["That's very kind of yuo [sic]" Tom wrote back.]])
"That's very kind of yuo [sic]" Tom wrote back.
=> (print #[==[1 + 1 = 2]==])
1 + 1 = 2

A bracket string can contain newlines, but if it begins with one, the newline
is removed, so you can begin the content of a bracket string on the line
following the opening delimiter with no effect on the content. Any leading
newlines past the first are preserved.

Plain string literals support :ref:`a variety of backslash escapes
<py:strings>`. To create a "raw string" that interprets all backslashes
literally, prefix the string with ``r``, as in ``r"slash\not"``. Bracket
strings are always raw strings and don't allow the ``r`` prefix.

Whether running under Python 2 or Python 3, Hy treats all string literals as
sequences of Unicode characters by default, and allows you to prefix a plain
string literal (but not a bracket string) with ``b`` to treat it as a sequence
of bytes. So when running under Python 3, Hy translates ``"foo"`` and
``b"foo"`` to the identical Python code, but when running under Python 2,
``"foo"`` is translated to ``u"foo"`` and ``b"foo"`` is translated to
``"foo"``.

.. _syntax-keywords:

Expand Down
8 changes: 7 additions & 1 deletion docs/language/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ HyString
~~~~~~~~

``hy.models.HyString`` is the base class of string-equivalent Hy
models. It also represents double-quoted string literals, ``""``, which
models. It also represents string literals (including bracket strings), which
compile down to unicode string literals in Python. ``HyStrings`` inherit
unicode objects in Python 2, and string objects in Python 3 (and are
therefore not encoding-dependent).
Expand All @@ -113,6 +113,12 @@ Hy literal strings can span multiple lines, and are considered by the
parser as a single unit, respecting the Python escapes for unicode
strings.

``HyString``\s have an attribute ``brackets`` that stores the custom
delimiter used for a bracket string (e.g., ``"=="`` for ``#[==[hello
world]==]`` and the empty string for ``#[[hello world]]``).
``HyString``\s that are not produced by bracket strings have their
``brackets`` set to ``None``.

HyBytes
~~~~~~~

Expand Down
6 changes: 6 additions & 0 deletions hy/compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -745,6 +745,12 @@ def _render_quoted_form(self, form, level):
return imports, HyExpression([HySymbol(name),
HyString(form)]).replace(form), False

elif isinstance(form, HyString):
x = [HySymbol(name), form]
if form.brackets is not None:
x.extend([HyKeyword(":brackets"), form.brackets])
return imports, HyExpression(x).replace(form), False

return imports, HyExpression([HySymbol(name),
form]).replace(form), False

Expand Down
6 changes: 6 additions & 0 deletions hy/lex/lexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@
lg.add('UNQUOTE', r'~%s' % end_quote)
lg.add('DISCARD', r'#_')
lg.add('HASHSTARS', r'#\*+')
lg.add('BRACKETSTRING', r'''(?x)
\# \[ ( [^\[\]]* ) \[ # Opening delimiter
\n? # A single leading newline will be ignored
((?:\n|.)*?) # Content of the string
\] \1 \] # Closing delimiter
''')
lg.add('HASHOTHER', r'#%s' % identifier)

# A regexp which matches incomplete strings, used to support
Expand Down
9 changes: 9 additions & 0 deletions hy/lex/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,15 @@ def t_partial_string(p):
raise PrematureEndOfInput("Premature end of input")


bracket_string_re = next(r.re for r in lexer.rules if r.name == 'BRACKETSTRING')
@pg.production("string : BRACKETSTRING")
@set_boundaries
def t_bracket_string(p):
m = bracket_string_re.match(p[0].value)
delim, content = m.groups()
return HyString(content, brackets=delim)


@pg.production("identifier : IDENTIFIER")
@set_boundaries
def t_identifier(p):
Expand Down
5 changes: 4 additions & 1 deletion hy/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,10 @@ class HyString(HyObject, str_type):
scripts. It's either a ``str`` or a ``unicode``, depending on the
Python version.
"""
pass
def __new__(cls, s=None, brackets=None):
value = super(HyString, cls).__new__(cls, s)
value.brackets = brackets
return value

_wrappers[str_type] = HyString

Expand Down
36 changes: 30 additions & 6 deletions tests/compilers/test_ast.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- encoding: utf-8 -*-
# Copyright 2017 the authors.
# This file is part of Hy, which is free software licensed under the Expat
# license. See the LICENSE.
Expand Down Expand Up @@ -46,6 +47,10 @@ def cant_compile(expr):
return e


def s(x):
return can_compile(x).body[0].value.s


def test_ast_bad_type():
"Make sure AST breakage can happen"
class C:
Expand Down Expand Up @@ -480,12 +485,31 @@ def _compile_string(s):


def test_ast_unicode_vs_bytes():
def f(x): return can_compile(x).body[0].value.s
assert f('"hello"') == u"hello"
assert type(f('"hello"')) is (str if PY3 else unicode) # noqa
assert f('b"hello"') == (eval('b"hello"') if PY3 else "hello")
assert type(f('b"hello"')) == (bytes if PY3 else str)
assert f('b"\\xa0"') == (bytes([160]) if PY3 else chr(160))
assert s('"hello"') == u"hello"
assert type(s('"hello"')) is (str if PY3 else unicode) # noqa
assert s('b"hello"') == (eval('b"hello"') if PY3 else "hello")
assert type(s('b"hello"')) is (bytes if PY3 else str)
assert s('b"\\xa0"') == (bytes([160]) if PY3 else chr(160))


def test_ast_bracket_string():
assert s(r'#[[empty delims]]') == 'empty delims'
assert s(r'#[my delim[fizzle]my delim]') == 'fizzle'
assert s(r'#[[]]') == ''
assert s(r'#[my delim[]my delim]') == ''
assert type(s('#[X[hello]X]')) is (str if PY3 else unicode) # noqa
assert s(r'#[X[raw\nstring]X]') == 'raw\\nstring'
assert s(r'#[foozle[aa foozli bb ]foozle]') == 'aa foozli bb '
assert s(r'#[([unbalanced](]') == 'unbalanced'
assert s(r'#[(1💯@)} {a![hello world](1💯@)} {a!]') == 'hello world'
assert (s(r'''#[X[
Remove the leading newline, please.
]X]''') == 'Remove the leading newline, please.\n')
assert (s(r'''#[X[


Only one leading newline should be removed.
]X]''') == '\n\nOnly one leading newline should be removed.\n')


def test_compile_error():
Expand Down
6 changes: 6 additions & 0 deletions tests/native_tests/language.hy
Original file line number Diff line number Diff line change
Expand Up @@ -1197,6 +1197,12 @@
(assert (= (eval `(get ~d ~k)) 2)))


(defn test-quote-bracket-string-delim []
(assert (= (. '#[my delim[hello world]my delim] brackets) "my delim"))
(assert (= (. '#[[squid]] brackets) ""))
(assert (none? (. '"squid" brackets))))


(defn test-import-syntax []
"NATIVE: test the import syntax."

Expand Down
11 changes: 11 additions & 0 deletions tests/test_lex.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,17 @@ def test_lex_strings():
assert objs == [HyString("abc")]


def test_lex_bracket_strings():

objs = tokenize("#[my delim[hello world]my delim]")
assert objs == [HyString("hello world")]
assert objs[0].brackets == "my delim"

objs = tokenize("#[[squid]]")
assert objs == [HyString("squid")]
assert objs[0].brackets == ""


def test_lex_integers():
""" Make sure that integers are valid expressions"""
objs = tokenize("42 ")
Expand Down