packcc

Import Files of Character Matching Rules

Overview

In this directory, import files that define character matching rules are stored. These import files are recommended to be imported after the last rule in the PEG file that imports them.

Import Files

`char/ascii_character_group.peg`

Synopsis

An import file that defines rules to match an ASCII character belonging to a specific character group.

PEG Rules

The following PEG rules are available.

Rule Name	Description
`ASCII_Printable_Character`	Matches a printable character, i.e. a character other than control characters.
`ASCII_Letter`	Matches an alphabet character (`[A-Za-z]`).
`ASCII_Control_Character`	Matches a control character (`[\x00-\x1f\x7f]`).
`ASCII_Special_Character`	Matches a character other than control characters, number characters, and alphabet characters.
`ASCII_Number`	Matches a number character (`[0-9]`).
`ASCII_Uppercase_Letter`	Matches an uppercase alphabet character (`[A-Z]`).
`ASCII_Lowercase_Letter`	Matches a lowercase alphabet character (`[a-z]`).
`ASCII_C_alnum`	Matches a character for which the standard C function `isalnum()` returns a non-zero value (`[0-9A-Za-z]`).
`ASCII_C_alpha`	Matches a character for which the standard C function `isalpha()` returns a non-zero value (= `ASCII_Letter`).
`ASCII_C_blank`	Matches a character for which the standard C function `isblank()` returns a non-zero value (`[ \t]`).
`ASCII_C_cntrl`	Matches a character for which the standard C function `iscntrl()` returns a non-zero value (= `ASCII_Control_Character`).
`ASCII_C_digit`	Matches a character for which the standard C function `isdigit()` returns a non-zero value (= `ASCII_Number`).
`ASCII_C_graph`	Matches a character for which the standard C function `isgraph()` returns a non-zero value (= `ASCII_Printable_Character` excluding the space character `' '`).
`ASCII_C_lower`	Matches a character for which the standard C function `islower()` returns a non-zero value (= `ASCII_Lowercase_Letter`).
`ASCII_C_print`	Matches a character for which the standard C function `isprint()` returns a non-zero value (= `ASCII_Printable_Character`).
`ASCII_C_punct`	Matches a character for which the standard C function `ispunct()` returns a non-zero value (= `ASCII_Special_Character` excluding the space character `' '`).
`ASCII_C_space`	Matches a character for which the standard C function `isspace()` returns a non-zero value (`[ \t\n\v\f\r]`).
`ASCII_C_upper`	Matches a character for which the standard C function `isupper()` returns a non-zero value (= `ASCII_Uppercase_Letter`).
`ASCII_C_xdigit`	Matches a character for which the standard C function `isxdigit()` returns a non-zero value (`[0-9A-Fa-f]`).

`char/unicode_general_category.peg`

Synopsis

An import file that defines rules to match a Unicode character belonging to a specific general category.

PEG Rules

The following PEG rules are available.

Rule Name	Description
`Unicode_Uppercase_Letter`	Matches an uppercase letter.
`Unicode_Lowercase_Letter`	Matches a lowercase letter.
`Unicode_Titlecase_Letter`	Matches a digraph encoded as a single character, with the first part uppercase.
`Unicode_Cased_Letter`	Matches a cased letter (= `Unicode_Uppercase_Letter / Unicode_Lowercase_Letter / Unicode_Titlecase_Letter`).
`Unicode_Modifier_Letter`	Matches a modifier letter.
`Unicode_Other_Letter`	Matches a letter of other type, including syllables and ideographs.
`Unicode_Letter`	Matches a letter (= `Unicode_Cased_Letter / Unicode_Modifier_Letter / Unicode_Other_Letter`).
`Unicode_Nonspacing_Mark`	Matches a nonspacing combining mark (zero advance width).
`Unicode_Spacing_Mark`	Matches a spacing combining mark (positive advance width).
`Unicode_Enclosing_Mark`	Matches an enclosing combining mark.
`Unicode_Mark`	Matches a mark (= `Unicode_Nonspacing_Mark / Unicode_Spacing_Mark / Unicode_Enclosing_Mark`).
`Unicode_Decimal_Number`	Matches a decimal digit.
`Unicode_Letter_Number`	Matches a letterlike numeric character.
`Unicode_Other_Number`	Matches a numeric character of other type.
`Unicode_Number`	Matches a numeric character (= `Unicode_Decimal_Number / Unicode_Letter_Number / Unicode_Other_Number`).
`Unicode_Connector_Punctuation`	Matches a connecting punctuation mark, like a tie.
`Unicode_Dash_Punctuation`	Matches a dash or hyphen punctuation mark.
`Unicode_Open_Punctuation`	Matches an opening punctuation mark (of a pair).
`Unicode_Close_Punctuation`	Matches a closing punctuation mark (of a pair).
`Unicode_Initial_Punctuation`	Matches an initial quotation mark.
`Unicode_Final_Punctuation`	Matches a final quotation mark.
`Unicode_Other_Punctuation`	Matches a punctuation mark of other type.
`Unicode_Punctuation`	Matches a punctuation mark (= `Unicode_Connector_Punctuation / Unicode_Dash_Punctuation / Unicode_Open_Punctuation / Unicode_Close_Punctuation / Unicode_Initial_Punctuation / Unicode_Final_Punctuation / Unicode_Other_Punctuation`).
`Unicode_Math_Symbol`	Matches a symbol of mathematical use.
`Unicode_Currency_Symbol`	Matches a currency sign.
`Unicode_Modifier_Symbol`	Matches a non-letterlike modifier symbol.
`Unicode_Other_Symbol`	Matches a symbol of other type.
`Unicode_Symbol`	Matches a symbol (= `Unicode_Math_Symbol / Unicode_Currency_Symbol / Unicode_Modifier_Symbol / Unicode_Other_Symbol`).
`Unicode_Space_Separator`	Matches a space character (of various non-zero widths).
`Unicode_Line_Separator`	Matches U+2028 “LINE SEPARATOR” only.
`Unicode_Paragraph_Separator`	Matches U+2029 “PARAGRAPH SEPARATOR” only.
`Unicode_Separator`	Matches a space character (= `Unicode_Space_Separator / Unicode_Line_Separator / Unicode_Paragraph_Separator`).
`Unicode_Control`	Matches a C0 or C1 control code.
`Unicode_Format`	Matches a format control character.
`Unicode_Surrogate`	Matches a surrogate code point.
`Unicode_Private_Use`	Matches a private-use character.
`Unicode_Other`	Matches a character of other type (= `Unicode_Control / Unicode_Format / Unicode_Surrogate / Unicode_Private_Use`).

`char/unicode_derived_core.peg`

Synopsis

An import file that defines rules to match a Unicode character belonging to a specific derived core property.

PEG Rules

The following PEG rules are available.

Rule Name	Description
`Unicode_Lowercase`	Matches a character with the Lowercase property.
`Unicode_Uppercase`	Matches a character with the Uppercase property.
`Unicode_Cased`	Matches a character which is considered to be either uppercase, lowercase or titlecase characters.
`Unicode_Case_Ignorable`	Matches a character which is ignored for casing purposes.
`Unicode_Changes_When_Lowercased`	Matches a character whose normalized form is not stable under a toLowercase mapping.
`Unicode_Changes_When_Uppercased`	Matches a character whose normalized form is not stable under a toUppercase mapping.
`Unicode_Changes_When_Titlecased`	Matches a character whose normalized form is not stable under a toTitlecase mapping.
`Unicode_Changes_When_Casefolded`	Matches a character whose normalized form is not stable under case folding.
`Unicode_Changes_When_Casemapped`	Matches a character which may change when it undergoes case mapping.
`Unicode_Alphabetic`	Matches a character with the Alphabetic property.
`Unicode_Default_Ignorable_Code_Point`	Matches a character which should be ignored in rendering unless explicitly supported by programs.
`Unicode_Grapheme_Base`	Matches a character with the property used to define “Grapheme base”.
`Unicode_Grapheme_Extend`	Matches a character with the property used to define “Grapheme extender”.
`Unicode_Math`	Matches a character with the Math property.
`Unicode_ID_Start`	Matches a character which may be used as the first letter of an identifier in a programming language.
`Unicode_ID_Continue`	Matches a character which may be used as the second and subsequent letters of an identifier in a programming language.
`Unicode_XID_Start`	Matches a character which can be mapped to a `Unicode_ID_Start` character under NFKC-normalization.
`Unicode_XID_Continue`	Matches a character which can be mapped to a `Unicode_ID_Continue` character under NFKC-normalization.

This site is open source. Improve this page.