Macros | Functions | Variables
utf-8.c File Reference

Functions for checking that strings contain UTF-8 characters only. More...

#include "utf-8.h"
#include <stdlib.h>
#include <string.h>
#include "StackTrace.h"
Include dependency graph for utf-8.c:

Go to the source code of this file.

Macros

#define ARRAY_SIZE(a)   (sizeof(a) / sizeof(a[0]))
 

Functions

static const char * UTF8_char_validate (int len, const char *data)
 
int UTF8_validate (int len, const char *data)
 
int UTF8_validateString (const char *string)
 

Variables

struct {
   struct {
      char   lower
 
      char   upper
 
   }   bytes [4]
 
   int   len
 
valid_ranges []
 

Detailed Description

Functions for checking that strings contain UTF-8 characters only.

See page 104 of the Unicode Standard 5.0 for the list of well formed UTF-8 byte sequences.

Definition in file utf-8.c.

Macro Definition Documentation

#define ARRAY_SIZE (   a)    (sizeof(a) / sizeof(a[0]))

Macro to determine the number of elements in a single-dimension array

Definition at line 37 of file utf-8.c.

Function Documentation

static const char * UTF8_char_validate ( int  len,
const char *  data 
)
static

Validate a single UTF-8 character

Parameters
lenthe length of the string in "data"
datathe bytes to check for a valid UTF-8 char
Returns
pointer to the start of the next UTF-8 character in "data"

Definition at line 76 of file utf-8.c.

int UTF8_validate ( int  len,
const char *  data 
)

Validate a length-delimited string has only UTF-8 characters

Parameters
lenthe length of the string in "data"
datathe bytes to check for valid UTF-8 characters
Returns
1 (true) if the string has only UTF-8 characters, 0 (false) otherwise

Definition at line 129 of file utf-8.c.

int UTF8_validateString ( const char *  string)

Validate a null-terminated string has only UTF-8 characters

Parameters
stringthe string to check for valid UTF-8 characters
Returns
1 (true) if the string has only UTF-8 characters, 0 (false) otherwise

Definition at line 156 of file utf-8.c.

Variable Documentation

struct { ... } bytes[4]

up to 4 bytes can be used per character

int len

number of elements in the following array (1 to 4)

Definition at line 46 of file utf-8.c.

char lower

lower limit of valid range

Definition at line 49 of file utf-8.c.

char upper

upper limit of valid range

Definition at line 50 of file utf-8.c.

struct { ... } valid_ranges[]
Initial value:
=
{
{1, { {00, 0x7F} } },
{2, { {0xC2, 0xDF}, {0x80, 0xBF} } },
{3, { {0xE0, 0xE0}, {0xA0, 0xBF}, {0x80, 0xBF} } },
{3, { {0xE1, 0xEC}, {0x80, 0xBF}, {0x80, 0xBF} } },
{3, { {0xED, 0xED}, {0x80, 0x9F}, {0x80, 0xBF} } },
{3, { {0xEE, 0xEF}, {0x80, 0xBF}, {0x80, 0xBF} } },
{4, { {0xF0, 0xF0}, {0x90, 0xBF}, {0x80, 0xBF}, {0x80, 0xBF} } },
{4, { {0xF1, 0xF3}, {0x80, 0xBF}, {0x80, 0xBF}, {0x80, 0xBF} } },
{4, { {0xF4, 0xF4}, {0x80, 0x8F}, {0x80, 0xBF}, {0x80, 0xBF} } },
}

Structure to hold the valid ranges of UTF-8 characters, for each byte up to 4



plotjuggler
Author(s): Davide Faconti
autogenerated on Sun Dec 6 2020 04:02:49