Module unicode

An implementation of the Erlang/OTP unicode interface.

Description

This module implements a strict subset of the Erlang/OTP unicode interface.

Data Types

chardata()


chardata() = charlist() | unicode_binary()

charlist()


charlist() = maybe_improper_list(char() | unicode_binary() | charlist(), unicode_binary() | [])

encoding()


encoding() = utf8 | latin1

latin1_chardata()


latin1_chardata() = iodata()

unicode_binary()


unicode_binary() = binary()

Function Index

characters_to_binary/1Convert character data to an UTF8 binary.
characters_to_binary/2Convert character data in a given encoding to an UTF8 binary.
characters_to_binary/3Convert character data in a given encoding to a binary in a given encoding.
characters_to_list/1Convert UTF-8 data to a list of Unicode characters.
characters_to_list/2Convert UTF-8 or Latin1 data to a list of Unicode characters.

Function Details

characters_to_binary/1


characters_to_binary(Data::chardata() | latin1_chardata()) -> unicode_binary() | {error, list(), chardata() | latin1_chardata() | list()} | {incomplete, unicode_binary(), chardata() | latin1_chardata()}

Data: data to convert to UTF8

returns: an utf8 binary or a tuple if conversion failed.

Equivalent to characters_to_binary(Data, utf8, utf8).

Convert character data to an UTF8 binary

characters_to_binary/2


characters_to_binary(Data::chardata() | latin1_chardata(), InEncoding::encoding()) -> unicode_binary() | {error, list(), chardata() | latin1_chardata() | list()} | {incomplete, unicode_binary(), chardata() | latin1_chardata()}

Data: data to convert to UTF8
InEncoding: encoding of data

returns: an utf8 binary or a tuple if conversion failed.

Equivalent to characters_to_binary(Data, InEncoding, utf8).

Convert character data in a given encoding to an UTF8 binary

characters_to_binary/3


characters_to_binary(Data::chardata() | latin1_chardata(), InEncoding::encoding(), OutEncoding::encoding()) -> unicode_binary() | {error, list(), chardata() | latin1_chardata() | list()} | {incomplete, unicode_binary(), chardata() | latin1_chardata()}

Data: data to convert to UTF8
InEncoding: output encoding

returns: an encoded binary or a tuple if conversion failed.

Convert character data in a given encoding to a binary in a given encoding.

If conversion fails, the function returns a tuple with three elements:

  • First element is error or incomplete. incomplete means the conversion failed because of an incomplete unicode transform at the very end of data.

  • Second element is what has been converted so far.

  • Third element is the remaining data to be converted, for debugging purposes. This remaining data can differ with what Erlang/OTP returns.

Also, Erlang/OTP’s implementation may error with badarg for parameters for which this function merely returns an error tuple.

characters_to_list/1


characters_to_list(Data::chardata() | latin1_chardata()) -> list() | {error, list(), chardata() | latin1_chardata() | list()} | {incomplete, list(), chardata() | latin1_chardata()}

Data: data to convert to Unicode

returns: a list of characters or a tuple if conversion failed.

Convert UTF-8 data to a list of Unicode characters.

If conversion fails, the function returns a tuple with three elements:

  • First element is error or incomplete. incomplete means the conversion failed because of an incomplete unicode transform at the very end of data.

  • Second element is what has been converted so far.

  • Third element is the remaining data to be converted, for debugging purposes. This remaining data can differ with what Erlang/OTP returns.

characters_to_list/2


characters_to_list(Data::chardata() | latin1_chardata(), Encoding::encoding()) -> list() | {error, list(), chardata() | latin1_chardata() | list()} | {incomplete, list(), chardata() | latin1_chardata()}

Data: data to convert
Encoding: encoding of data to convert

returns: a list of characters or a tuple if conversion failed.

Convert UTF-8 or Latin1 data to a list of Unicode characters. Following Erlang/OTP, if input encoding is latin1, this function returns an error tuple if a character > 255 is passed (in a list). Otherwise, it will accept any character within Unicode range (0-0x10FFFF).

See also: characters_to_list/1.