lexical analysis More...
#include <json.hpp>
Public Member Functions | |
constexpr const char * | get_error_message () const noexcept |
return syntax error message More... | |
constexpr number_float_t | get_number_float () const noexcept |
return floating-point value More... | |
constexpr number_integer_t | get_number_integer () const noexcept |
return integer value More... | |
constexpr number_unsigned_t | get_number_unsigned () const noexcept |
return unsigned integer value More... | |
constexpr std::size_t | get_position () const noexcept |
return position of last read token More... | |
std::string | get_token_string () const |
lexer (detail::input_adapter_t adapter) | |
lexer (const lexer &)=delete | |
string_t && | move_string () |
return current string value (implicitly resets the token; useful only once) More... | |
lexer & | operator= (lexer &)=delete |
token_type | scan () |
Static Public Member Functions | |
static const char * | token_type_name (const token_type t) noexcept |
return name of values of type token_type (only used for errors) More... | |
Private Types | |
using | number_float_t = typename BasicJsonType::number_float_t |
using | number_integer_t = typename BasicJsonType::number_integer_t |
using | number_unsigned_t = typename BasicJsonType::number_unsigned_t |
using | string_t = typename BasicJsonType::string_t |
Private Member Functions | |
void | add (int c) |
add a character to token_buffer More... | |
std::char_traits< char >::int_type | get () |
int | get_codepoint () |
get codepoint from 4 hex characters following \u More... | |
bool | next_byte_in_range (std::initializer_list< int > ranges) |
check if the next byte(s) are inside a given range More... | |
void | reset () noexcept |
reset token_buffer; current character is beginning of token More... | |
token_type | scan_literal (const char *literal_text, const std::size_t length, token_type return_type) |
token_type | scan_number () |
scan a number literal More... | |
token_type | scan_string () |
scan a string literal More... | |
void | unget () |
unget current character (return it again on next get) More... | |
Static Private Member Functions | |
static char | get_decimal_point () noexcept |
return the locale-dependent decimal point More... | |
static void | strtof (float &f, const char *str, char **endptr) noexcept |
static void | strtof (double &f, const char *str, char **endptr) noexcept |
static void | strtof (long double &f, const char *str, char **endptr) noexcept |
Private Attributes | |
std::size_t | chars_read = 0 |
the number of characters read More... | |
std::char_traits< char >::int_type | current = std::char_traits<char>::eof() |
the current character More... | |
const char | decimal_point_char = '.' |
the decimal point More... | |
const char * | error_message = "" |
a description of occurred lexer errors More... | |
detail::input_adapter_t | ia = nullptr |
input adapter More... | |
string_t | token_buffer {} |
buffer for variable-length tokens (numbers, strings) More... | |
std::vector< char > | token_string {} |
raw input token string (for error messages) More... | |
number_float_t | value_float = 0 |
number_integer_t | value_integer = 0 |
number_unsigned_t | value_unsigned = 0 |
lexical analysis
This class organizes the lexical analysis during JSON deserialization.
|
private |
|
private |
|
private |
|
private |
|
strong |
token types for the parser
Enumerator | |
---|---|
uninitialized |
indicating the scanner is uninitialized |
literal_true |
the |
literal_false |
the |
literal_null |
the |
value_string |
a string – use get_string() for actual value |
value_unsigned |
an unsigned integer – use get_number_unsigned() for actual value |
value_integer |
a signed integer – use get_number_integer() for actual value |
value_float |
an floating point number – use get_number_float() for actual value |
begin_array |
the character for array begin |
begin_object |
the character for object begin |
end_array |
the character for array end |
end_object |
the character for object end |
name_separator |
the name separator |
value_separator |
the value separator |
parse_error |
indicating a parse error |
end_of_input |
indicating the end of the input buffer |
literal_or_value |
a literal or the begin of a value (only for diagnostics) |
|
inlineexplicit |
|
delete |
|
inlineprivate |
|
inlineprivate |
|
inlineprivate |
get codepoint from 4 hex characters following \u
For input "\u c1 c2 c3 c4" the codepoint is: (c1 * 0x1000) + (c2 * 0x0100) + (c3 * 0x0010) + c4 = (c1 << 12) + (c2 << 8) + (c3 << 4) + (c4 << 0)
Furthermore, the possible characters '0'..'9', 'A'..'F', and 'a'..'f' must be converted to the integers 0x0..0x9, 0xA..0xF, 0xA..0xF, resp. The conversion is done by subtracting the offset (0x30, 0x37, and 0x57) between the ASCII value of the character and the desired integer value.
|
inlinestaticprivatenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inline |
|
inline |
|
inlineprivate |
check if the next byte(s) are inside a given range
Adds the current byte and, for each passed range, reads a new byte and checks if it is inside the range. If a violation was detected, set up an error message and return false. Otherwise, return true.
[in] | ranges | list of integers; interpreted as list of pairs of inclusive lower and upper bound, respectively |
|
delete |
|
inlineprivatenoexcept |
|
inline |
|
inlineprivate |
|
inlineprivate |
scan a number literal
This function scans a string according to Sect. 6 of RFC 7159.
The function is realized with a deterministic finite state machine derived from the grammar described in RFC 7159. Starting in state "init", the input is read and used to determined the next state. Only state "done" accepts the number. State "error" is a trap state to model errors. In the table below, "anything" means any character but the ones listed before.
state | 0 | 1-9 | e E | + | - | . | anything |
---|---|---|---|---|---|---|---|
init | zero | any1 | [error] | [error] | minus | [error] | [error] |
minus | zero | any1 | [error] | [error] | [error] | [error] | [error] |
zero | done | done | exponent | done | done | decimal1 | done |
any1 | any1 | any1 | exponent | done | done | decimal1 | done |
decimal1 | decimal2 | [error] | [error] | [error] | [error] | [error] | [error] |
decimal2 | decimal2 | decimal2 | exponent | done | done | done | done |
exponent | any2 | any2 | [error] | sign | sign | [error] | [error] |
sign | any2 | any2 | [error] | [error] | [error] | [error] | [error] |
any2 | any2 | any2 | done | done | done | done | done |
The state machine is realized with one label per state (prefixed with "scan_number_") and goto
statements between them. The state machine contains cycles, but any cycle can be left when EOF is read. Therefore, the function is guaranteed to terminate.
During scanning, the read bytes are stored in token_buffer. This string is then converted to a signed integer, an unsigned integer, or a floating-point number.
.
to work with the locale-dependent converters.
|
inlineprivate |
scan a string literal
This function scans a string according to Sect. 7 of RFC 7159. While scanning, bytes are escaped and copied into buffer token_buffer. Then the function returns successfully, token_buffer is not null-terminated (as it may contain \0 bytes), and token_buffer.size() is the number of bytes in the string.
|
inlinestaticprivatenoexcept |
|
inlinestaticprivatenoexcept |
|
inlinestaticprivatenoexcept |
|
inlinestaticnoexcept |
|
inlineprivate |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |