EXTRACTTEXT
Overview
The EXTRACTTEXT workflow application extracts text content from an input file (.pdf, .docx, or .txt) and returns the extracted text and its length. It supports optional parameters for maximum file size, trimming, and text normalization (Unix-style line breaks).
Required parameters
Parameter
Type
Direction
Description
FILE
FILE
IN
The file from which to extract the text (must be .pdf, .docx, or .txt)
TEXT
TEXT
OUT
The extracted (and possibly normalized/trimmed) text
LENGTH
NUMERIC
OUT
The length (number of characters) of the extracted text
Optional parameters
Parameter
Type
Direction
Description
MAX_FILE_SIZE
NUMERIC
IN
Maximum allowed file size in MB
TRIM_SIZE
NUMERIC
IN
Maximum number of characters to keep from the extracted text
NORMALIZE
TEXT
IN
Whether to normalize line endings Possible values:
YNtruefalse
Last updated