In this directory, import files that define character matching rules are stored. These import files are recommended to be imported after the last rule in the PEG file that imports them.
char/ascii_character_group.peg
An import file that defines rules to match an ASCII character belonging to a specific character group.
The following PEG rules are available.
Rule Name | Description |
---|---|
ASCII_Printable_Character |
Matches a printable character, i.e. a character other than control characters. |
ASCII_Letter |
Matches an alphabet character ([A-Za-z] ). |
ASCII_Control_Character |
Matches a control character ([\x00-\x1f\x7f] ). |
ASCII_Special_Character |
Matches a character other than control characters, number characters, and alphabet characters. |
ASCII_Number |
Matches a number character ([0-9] ). |
ASCII_Uppercase_Letter |
Matches an uppercase alphabet character ([A-Z] ). |
ASCII_Lowercase_Letter |
Matches a lowercase alphabet character ([a-z] ). |
ASCII_C_alnum |
Matches a character for which the standard C function isalnum() returns a non-zero value ([0-9A-Za-z] ). |
ASCII_C_alpha |
Matches a character for which the standard C function isalpha() returns a non-zero value (= ASCII_Letter ). |
ASCII_C_blank |
Matches a character for which the standard C function isblank() returns a non-zero value ([ \t] ). |
ASCII_C_cntrl |
Matches a character for which the standard C function iscntrl() returns a non-zero value (= ASCII_Control_Character ). |
ASCII_C_digit |
Matches a character for which the standard C function isdigit() returns a non-zero value (= ASCII_Number ). |
ASCII_C_graph |
Matches a character for which the standard C function isgraph() returns a non-zero value (= ASCII_Printable_Character excluding the space character ' ' ). |
ASCII_C_lower |
Matches a character for which the standard C function islower() returns a non-zero value (= ASCII_Lowercase_Letter ). |
ASCII_C_print |
Matches a character for which the standard C function isprint() returns a non-zero value (= ASCII_Printable_Character ). |
ASCII_C_punct |
Matches a character for which the standard C function ispunct() returns a non-zero value (= ASCII_Special_Character excluding the space character ' ' ). |
ASCII_C_space |
Matches a character for which the standard C function isspace() returns a non-zero value ([ \t\n\v\f\r] ). |
ASCII_C_upper |
Matches a character for which the standard C function isupper() returns a non-zero value (= ASCII_Uppercase_Letter ). |
ASCII_C_xdigit |
Matches a character for which the standard C function isxdigit() returns a non-zero value ([0-9A-Fa-f] ). |
char/unicode_general_category.peg
An import file that defines rules to match a Unicode character belonging to a specific general category.
The following PEG rules are available.
Rule Name | Description |
---|---|
Unicode_Uppercase_Letter |
Matches an uppercase letter. |
Unicode_Lowercase_Letter |
Matches a lowercase letter. |
Unicode_Titlecase_Letter |
Matches a digraph encoded as a single character, with the first part uppercase. |
Unicode_Cased_Letter |
Matches a cased letter (= Unicode_Uppercase_Letter / Unicode_Lowercase_Letter / Unicode_Titlecase_Letter ). |
Unicode_Modifier_Letter |
Matches a modifier letter. |
Unicode_Other_Letter |
Matches a letter of other type, including syllables and ideographs. |
Unicode_Letter |
Matches a letter (= Unicode_Cased_Letter / Unicode_Modifier_Letter / Unicode_Other_Letter ). |
Unicode_Nonspacing_Mark |
Matches a nonspacing combining mark (zero advance width). |
Unicode_Spacing_Mark |
Matches a spacing combining mark (positive advance width). |
Unicode_Enclosing_Mark |
Matches an enclosing combining mark. |
Unicode_Mark |
Matches a mark (= Unicode_Nonspacing_Mark / Unicode_Spacing_Mark / Unicode_Enclosing_Mark ). |
Unicode_Decimal_Number |
Matches a decimal digit. |
Unicode_Letter_Number |
Matches a letterlike numeric character. |
Unicode_Other_Number |
Matches a numeric character of other type. |
Unicode_Number |
Matches a numeric character (= Unicode_Decimal_Number / Unicode_Letter_Number / Unicode_Other_Number ). |
Unicode_Connector_Punctuation |
Matches a connecting punctuation mark, like a tie. |
Unicode_Dash_Punctuation |
Matches a dash or hyphen punctuation mark. |
Unicode_Open_Punctuation |
Matches an opening punctuation mark (of a pair). |
Unicode_Close_Punctuation |
Matches a closing punctuation mark (of a pair). |
Unicode_Initial_Punctuation |
Matches an initial quotation mark. |
Unicode_Final_Punctuation |
Matches a final quotation mark. |
Unicode_Other_Punctuation |
Matches a punctuation mark of other type. |
Unicode_Punctuation |
Matches a punctuation mark (= Unicode_Connector_Punctuation / Unicode_Dash_Punctuation / Unicode_Open_Punctuation / Unicode_Close_Punctuation / Unicode_Initial_Punctuation / Unicode_Final_Punctuation / Unicode_Other_Punctuation ). |
Unicode_Math_Symbol |
Matches a symbol of mathematical use. |
Unicode_Currency_Symbol |
Matches a currency sign. |
Unicode_Modifier_Symbol |
Matches a non-letterlike modifier symbol. |
Unicode_Other_Symbol |
Matches a symbol of other type. |
Unicode_Symbol |
Matches a symbol (= Unicode_Math_Symbol / Unicode_Currency_Symbol / Unicode_Modifier_Symbol / Unicode_Other_Symbol ). |
Unicode_Space_Separator |
Matches a space character (of various non-zero widths). |
Unicode_Line_Separator |
Matches U+2028 “LINE SEPARATOR” only. |
Unicode_Paragraph_Separator |
Matches U+2029 “PARAGRAPH SEPARATOR” only. |
Unicode_Separator |
Matches a space character (= Unicode_Space_Separator / Unicode_Line_Separator / Unicode_Paragraph_Separator ). |
Unicode_Control |
Matches a C0 or C1 control code. |
Unicode_Format |
Matches a format control character. |
Unicode_Surrogate |
Matches a surrogate code point. |
Unicode_Private_Use |
Matches a private-use character. |
Unicode_Other |
Matches a character of other type (= Unicode_Control / Unicode_Format / Unicode_Surrogate / Unicode_Private_Use ). |