Merge pull request #218 from sailpoint-oss/feature/docs-transform-decomposediacriticalmarks

Added info about decomposediacriticalmarks transform.
This commit is contained in:
James Haytko
2023-04-14 09:42:28 -05:00
committed by GitHub

View File

@@ -21,6 +21,10 @@ The following are examples of diacritical marks:
> - Ň
> - Ŵ
The decomposeDiacriticalMarks transform uses the [Normalizer library](https://docs.oracle.com/javase/7/docs/api/java/text/Normalizer.html) to decompose the diacritical marks. It specifically uses the Normalization Form KD (NFKD), as described in Sections 3.6, 3.10, and 3.11 of the Unicode Standard, also summarized under [Annex 4: Decomposition](https://www.unicode.org/reports/tr15/tr15-23.html#Decomposition).
After decomposition, the transform uses a [Regex Replace](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) to replace all diacritical marks by using the `InCombiningDiacriticalMarks` property of Unicode (ex. `replaceAll("[\\p{InCombiningDiacriticalMarks}]", "")`).
## Transform Structure
The transform for decompose diacritical marks requires only the transform's `type` and `name` attributes:
@@ -88,3 +92,18 @@ Output: "Dubcek"
"name": "Decompose Diacritical Marks Transform"
}
```
## Testing
To run some tests in code, use this java code to compare the results of what the transform does to what your code does:
```java
import java.text.Normalizer;
import java.util.regex.Pattern;
// Decomposes characters from their diacritical marks
input = Normalizer.normalize(input, Normalizer.Form.NFKD);
// Removes the marks
input = input.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
```