ocr - Optical Character Recognition
Added in Pro 9.2
The $ocr module provides optical character recognition for extracting text from images. This built-in module is based on PaddleOCR. You must download the official PaddleOCR plugin from Auto.js Pro’s plugin store before using it. When packaging, the plugin can be bundled into the APK, so users don’t need to install it separately.
Auto.js Pro also provides another OCR plugin based on Google MLKit. See MLKitOCR plugin.
Plugin download
LanZou download:
https://wwwq.lanzouc.com/iFRks168m1aj
Note
Special thanks to the Auto.js community member L (QQ: 2056968162, author of the 7Zip plugin) for providing the initial integration code and later helping with bug fixes and optimizations, which greatly reduced development time.
$ocr.create([options])
options {object} Optional options:
models{string} Model.slimis faster with lower accuracy; if omitted, usesdefault(higher accuracy but slower). You can also pass an absolute path to a custom model.labelsFile{string} Label file for the model. Defaultnull. Used together withmodels.cpuPowerMode{string} CPU mode. Default:LITE_POWER_HIGH. Available values:LITE_POWER_HIGHBind to big cores. If the ARM CPU supports big.LITTLE, it prefers and binds to the big cluster. IfparallelThreadsexceeds the number of big cores, it will be scaled down automatically. If the device has no big cores or binding fails (e.g. in low-power mode on some phones), it falls back to no-bind mode.LITE_POWER_LOWBind to little cores. If big.LITTLE is available, it binds to the little cluster. IfparallelThreadsexceeds the number of little cores, it will be scaled down. If no little cores are found, it falls back to no-bind mode.LITE_POWER_FULLMix big + little cores. Thread count may exceed big-core count; if it exceeds total core count, it will be scaled down to the number of cores.LITE_POWER_NO_BINDNo CPU binding (recommended). The system schedules work to idle CPU cores based on load.LITE_POWER_RAND_HIGHRotate binding among big cores. If the big cluster has multiple cores, it switches to the next core every 10 inferences.LITE_POWER_RAND_LOWRotate binding among little cores. If the little cluster has multiple cores, it switches to the next core every 10 inferences.
parallelThreads{number} Parallel threads. Default4.useOpenCL{boolean} Whether to use OpenCL. Defaultfalse.
Returns {OCR} A new OCR instance
Create an OCR instance with the given options. In most cases, you don’t need to customize anything; $ocr.create() is enough.
A simple example: take screenshots and detect text:
// Create OCR. You must install the official PaddleOCR plugin from the plugin store first.
let ocr = $ocr.create({
models: 'slim', // faster but lower accuracy; omit to use default (more accurate but slower)
});
requestScreenCapture();
for (let i = 0; i < 5; i++) {
let capture = captureScreen();
// Detect screenshot text and measure time. The first run is usually slower.
// The time depends on image size/content/text count.
// You can tune threads/CPU mode via $ocr.create() options.
let start = Date.now();
let result = ocr.detect(capture);
let end = Date.now();
console.log(result);
toastLog(`#${i + 1} detect: ${end - start}ms`);
sleep(3000);
}
ocr.release();See also: PaddleOCR docs.
OCR
Object returned by $ocr.create(), used for OCR detection. Call release() when you no longer need it to free resources.
OCR.detect(image[, options])
image{Image} Image to recognize text from.- options {object} Optional options:
max{number} Max number of text items to return. Default1000.detectRotation{boolean} Whether to detect text rotation. Defaultfalse.region{Array} OCR region: an array of 2 or 4 elements.(region[0], region[1])is the top-left corner;region[2] * region[3]is width/height. If only 2 elements are provided, the region is from(region[0], region[1])to the bottom-right corner of the image. If omitted, the region is the whole image. Added in 9.3.
- Returns {Array<OCRResult>} OCR results array (confidence, text, bounds, etc.)
Run OCR on the given image with options and return results as an array.
requestScreenCapture();
sleep(1000);
let ocr = $ocr.create();
let capture = captureScreen();
let result = ocr.detect(capture);
// Iterate results and print text
result.forEach(item => {
console.log(item.text, item.confidence);
});
// Filter results with confidence > 0.9
let filtered = result.filter(item => item.confidence > 0.9);
// Fuzzy search: find a result containing "Auto.js"
let autojs = filtered.find(item => item.text.includes("Auto.js"));
console.log(autojs);
// If found, print confidence/bounds/center and click it
if (autojs) {
console.log(`confidence = $\{autojs.confidence}, bounds = ${autojs.bounds}, center = (${autojs.bounds.centerX()}, ${autojs.bounds.centerY()\})`);
autojs.clickCenter();
}
ocr.release();OCR.release()
Release OCR resources. Resources are released automatically on process exit, but you should release them as soon as you don’t need OCR.
OCRResult
An element of the array returned by $ocr.detect(). Includes confidence, text content, bounds, rotation, and rotation confidence.
OCRResult.confidence
- {number}
Confidence of the recognized text, in the range ([0, 1]). Closer to 1 means more reliable.
OCRResult.text
- {string}
Recognized text content.
OCRResult.bounds
- {Rect}
Bounding rectangle of the recognized text in the image.
OCRResult.rotation
- {number}
Rotation angle of the recognized text in degrees, in the range ([0, 360)). Typically 0 or 180. Effective only when detectRotation is true.
OCRResult.rotationConfidence
- {number}
Confidence of the rotation angle, in the range ([0, 1]). Effective only when detectRotation is true.
OCRResult.javaObject
- {object}
Raw Java object of the OCR result. It’s not very useful for PaddleOCR itself, but other OCR implementations may provide extra metadata such as lines, paragraphs, or word segmentation.
OCRResult.clickCenter()
- Returns {boolean}
Click the center point of this result’s bounds on screen. Returns whether the click succeeded. Equivalent to click(result.bounds.centerX(), result.bounds.centerY()).
