The official information for the Unicode Collation Algorithm is specified in Unicode Technical Report #10, which can be found on The Unicode Consortium website at:The ICU – International Components for Unicode also provides much useful information. An explanation of the collation concepts can be found on their website located at:The basic concept behind the Unicode Collation Algorithm is the use of multilevel comparison. This means that a number of levels are defined, which are listed as level 1 through level 5 in the following bullet points. Each level defines a type of comparison. Strings are first compared using the primary level, also called level 1.If the order can be determined based on the primary level, then the algorithm is done. If the order cannot be determined based on the primary level, then the secondary level, level 2, is applied. If the order can be determined based on the secondary level, then the algorithm is done, otherwise the tertiary level is applied, and so on. There is typically, a final tie-breaking level to determine the order if it cannot be resolved by the prior levels.
• Level 1 – Primary Level for Base Characters. The order of basic characters such as letters and digits determines the difference such as A < B.
• Level 2 – Secondary Level for Accents. If there are no primary level differences, then the presence or absence of accents and other such characters determine the order such as a < á.
• Level 3 – Tertiary Level for Case. If there are no primary level or secondary level differences, then a difference in case determines the order such as a < A.
• Level 4 – Quaternary Level for Punctuation. If there are no primary, secondary, or tertiary level differences, then the presence or absence of white space characters, control characters, and punctuation determine the order such as -A < A.
• Level 5 – Identical Level for Tie-Breaking. If there are no primary, secondary, tertiary, or quaternary level differences, then some other difference such as the code point values determines the order.