mirror of
https://github.com/Artikash/Textractor.git
synced 2024-12-23 17:04:12 +08:00
Updated FAQ (markdown)
parent
7238eb28b8
commit
1934c19d59
6
FAQ.md
6
FAQ.md
@ -2,6 +2,12 @@
|
|||||||
|
|
||||||
## Textractor is extracting text *mostly* correctly, but there's some extra characters as markup/garbage (e.g. a `\n` in place of every line break). Is there a way to clean the text?
|
## Textractor is extracting text *mostly* correctly, but there's some extra characters as markup/garbage (e.g. a `\n` in place of every line break). Is there a way to clean the text?
|
||||||
Yup, use the `Regex Filter` or `Replacer` extension. Remember to put the extension near the top of the list so the other extensions see the cleaned text.
|
Yup, use the `Regex Filter` or `Replacer` extension. Remember to put the extension near the top of the list so the other extensions see the cleaned text.
|
||||||
|
Some useful regex filters:
|
||||||
|
`\s` (filters all whitespace)
|
||||||
|
`[\u0021-\u00ff]` (filters all european language and most special characters)
|
||||||
|
`[\u0100-\uffff]` (filters all non european language characters)
|
||||||
|
`[\u0000-\u2fff\ua000-\uffff]` (filters all non Chinese/Japanese/Korean characters)
|
||||||
|
`<.+?>` (filters all HTML tags like <p id="some_guid"> </span>)
|
||||||
|
|
||||||
## Textractor is extracting text with some characters missing or is unable to extract any text remotely close to what I need. How do I extract the correct text?
|
## Textractor is extracting text with some characters missing or is unable to extract any text remotely close to what I need. How do I extract the correct text?
|
||||||
Oof, looks like you found a game with an engine that Textractor doesn't natively support. There's two things you should try:
|
Oof, looks like you found a game with an engine that Textractor doesn't natively support. There's two things you should try:
|
||||||
|
Loading…
x
Reference in New Issue
Block a user