packcc

Import Files of Character Matching Rules

Overview

In this directory, import files that define character matching rules are stored. These import files are recommended to be imported after the last rule in the PEG file that imports them.

Import Files

char/ascii_character_group.peg

Synopsis

An import file that defines rules to match an ASCII character belonging to a specific character group.

PEG Rules

The following PEG rules are available.

Rule Name Description
ASCII_Printable_Character Matches a printable character, i.e. a character other than control characters.
ASCII_Letter Matches an alphabet character ([A-Za-z]).
ASCII_Control_Character Matches a control character ([\x00-\x1f\x7f]).
ASCII_Special_Character Matches a character other than control characters, number characters, and alphabet characters.
ASCII_Number Matches a number character ([0-9]).
ASCII_Uppercase_Letter Matches an uppercase alphabet character ([A-Z]).
ASCII_Lowercase_Letter Matches a lowercase alphabet character ([a-z]).
ASCII_C_alnum Matches a character for which the standard C function isalnum() returns a non-zero value ([0-9A-Za-z]).
ASCII_C_alpha Matches a character for which the standard C function isalpha() returns a non-zero value (= ASCII_Letter).
ASCII_C_blank Matches a character for which the standard C function isblank() returns a non-zero value ([ \t]).
ASCII_C_cntrl Matches a character for which the standard C function iscntrl() returns a non-zero value (= ASCII_Control_Character).
ASCII_C_digit Matches a character for which the standard C function isdigit() returns a non-zero value (= ASCII_Number).
ASCII_C_graph Matches a character for which the standard C function isgraph() returns a non-zero value (= ASCII_Printable_Character excluding the space character ' ').
ASCII_C_lower Matches a character for which the standard C function islower() returns a non-zero value (= ASCII_Lowercase_Letter).
ASCII_C_print Matches a character for which the standard C function isprint() returns a non-zero value (= ASCII_Printable_Character).
ASCII_C_punct Matches a character for which the standard C function ispunct() returns a non-zero value (= ASCII_Special_Character excluding the space character ' ').
ASCII_C_space Matches a character for which the standard C function isspace() returns a non-zero value ([ \t\n\v\f\r]).
ASCII_C_upper Matches a character for which the standard C function isupper() returns a non-zero value (= ASCII_Uppercase_Letter).
ASCII_C_xdigit Matches a character for which the standard C function isxdigit() returns a non-zero value ([0-9A-Fa-f]).

char/unicode_general_category.peg

Synopsis

An import file that defines rules to match a Unicode character belonging to a specific general category.

PEG Rules

The following PEG rules are available.

Rule Name Description
Unicode_Uppercase_Letter Matches an uppercase letter.
Unicode_Lowercase_Letter Matches a lowercase letter.
Unicode_Titlecase_Letter Matches a digraph encoded as a single character, with the first part uppercase.
Unicode_Cased_Letter Matches a cased letter (= Unicode_Uppercase_Letter / Unicode_Lowercase_Letter / Unicode_Titlecase_Letter).
Unicode_Modifier_Letter Matches a modifier letter.
Unicode_Other_Letter Matches a letter of other type, including syllables and ideographs.
Unicode_Letter Matches a letter (= Unicode_Cased_Letter / Unicode_Modifier_Letter / Unicode_Other_Letter).
Unicode_Nonspacing_Mark Matches a nonspacing combining mark (zero advance width).
Unicode_Spacing_Mark Matches a spacing combining mark (positive advance width).
Unicode_Enclosing_Mark Matches an enclosing combining mark.
Unicode_Mark Matches a mark (= Unicode_Nonspacing_Mark / Unicode_Spacing_Mark / Unicode_Enclosing_Mark).
Unicode_Decimal_Number Matches a decimal digit.
Unicode_Letter_Number Matches a letterlike numeric character.
Unicode_Other_Number Matches a numeric character of other type.
Unicode_Number Matches a numeric character (= Unicode_Decimal_Number / Unicode_Letter_Number / Unicode_Other_Number).
Unicode_Connector_Punctuation Matches a connecting punctuation mark, like a tie.
Unicode_Dash_Punctuation Matches a dash or hyphen punctuation mark.
Unicode_Open_Punctuation Matches an opening punctuation mark (of a pair).
Unicode_Close_Punctuation Matches a closing punctuation mark (of a pair).
Unicode_Initial_Punctuation Matches an initial quotation mark.
Unicode_Final_Punctuation Matches a final quotation mark.
Unicode_Other_Punctuation Matches a punctuation mark of other type.
Unicode_Punctuation Matches a punctuation mark (= Unicode_Connector_Punctuation / Unicode_Dash_Punctuation / Unicode_Open_Punctuation / Unicode_Close_Punctuation / Unicode_Initial_Punctuation / Unicode_Final_Punctuation / Unicode_Other_Punctuation).
Unicode_Math_Symbol Matches a symbol of mathematical use.
Unicode_Currency_Symbol Matches a currency sign.
Unicode_Modifier_Symbol Matches a non-letterlike modifier symbol.
Unicode_Other_Symbol Matches a symbol of other type.
Unicode_Symbol Matches a symbol (= Unicode_Math_Symbol / Unicode_Currency_Symbol / Unicode_Modifier_Symbol / Unicode_Other_Symbol).
Unicode_Space_Separator Matches a space character (of various non-zero widths).
Unicode_Line_Separator Matches U+2028 “LINE SEPARATOR” only.
Unicode_Paragraph_Separator Matches U+2029 “PARAGRAPH SEPARATOR” only.
Unicode_Separator Matches a space character (= Unicode_Space_Separator / Unicode_Line_Separator / Unicode_Paragraph_Separator).
Unicode_Control Matches a C0 or C1 control code.
Unicode_Format Matches a format control character.
Unicode_Surrogate Matches a surrogate code point.
Unicode_Private_Use Matches a private-use character.
Unicode_Other Matches a character of other type (= Unicode_Control / Unicode_Format / Unicode_Surrogate / Unicode_Private_Use).

char/unicode_derived_core.peg

Synopsis

An import file that defines rules to match a Unicode character belonging to a specific derived core property.

PEG Rules

The following PEG rules are available.

Rule Name Description
Unicode_Lowercase Matches a character with the Lowercase property.
Unicode_Uppercase Matches a character with the Uppercase property.
Unicode_Cased Matches a character which is considered to be either uppercase, lowercase or titlecase characters.
Unicode_Case_Ignorable Matches a character which is ignored for casing purposes.
Unicode_Changes_When_Lowercased Matches a character whose normalized form is not stable under a toLowercase mapping.
Unicode_Changes_When_Uppercased Matches a character whose normalized form is not stable under a toUppercase mapping.
Unicode_Changes_When_Titlecased Matches a character whose normalized form is not stable under a toTitlecase mapping.
Unicode_Changes_When_Casefolded Matches a character whose normalized form is not stable under case folding.
Unicode_Changes_When_Casemapped Matches a character which may change when it undergoes case mapping.
Unicode_Alphabetic Matches a character with the Alphabetic property.
Unicode_Default_Ignorable_Code_Point Matches a character which should be ignored in rendering unless explicitly supported by programs.
Unicode_Grapheme_Base Matches a character with the property used to define “Grapheme base”.
Unicode_Grapheme_Extend Matches a character with the property used to define “Grapheme extender”.
Unicode_Math Matches a character with the Math property.
Unicode_ID_Start Matches a character which may be used as the first letter of an identifier in a programming language.
Unicode_ID_Continue Matches a character which may be used as the second and subsequent letters of an identifier in a programming language.
Unicode_XID_Start Matches a character which can be mapped to a Unicode_ID_Start character under NFKC-normalization.
Unicode_XID_Continue Matches a character which can be mapped to a Unicode_ID_Continue character under NFKC-normalization.