ocr - Optical Character Recognition

老猫10/22/22About 3 min

Added in Pro 9.2

The $ocr module provides optical character recognition for extracting text from images. This built-in module is based on PaddleOCR. You must download the official PaddleOCR plugin from Auto.js Pro’s plugin store before using it. When packaging, the plugin can be bundled into the APK, so users don’t need to install it separately.

Auto.js Pro also provides another OCR plugin based on Google MLKit. See MLKitOCR plugin.

Plugin download

LanZou download:
https://wwwq.lanzouc.com/iFRks168m1aj

Note

Special thanks to the Auto.js community member L (QQ: 2056968162, author of the 7Zip plugin) for providing the initial integration code and later helping with bug fixes and optimizations, which greatly reduced development time.

$ocr.create([options])

options {object} Optional options:
- models {string} Model. slim is faster with lower accuracy; if omitted, uses default (higher accuracy but slower). You can also pass an absolute path to a custom model.
- labelsFile {string} Label file for the model. Default null. Used together with models.
- cpuPowerMode {string} CPU mode. Default:
  - LITE_POWER_HIGH. Available values:
  - LITE_POWER_HIGH Bind to big cores. If the ARM CPU supports big.LITTLE, it prefers and binds to the big cluster. If parallelThreads exceeds the number of big cores, it will be scaled down automatically. If the device has no big cores or binding fails (e.g. in low-power mode on some phones), it falls back to no-bind mode.
  - LITE_POWER_LOW Bind to little cores. If big.LITTLE is available, it binds to the little cluster. If parallelThreads exceeds the number of little cores, it will be scaled down. If no little cores are found, it falls back to no-bind mode.
  - LITE_POWER_FULL Mix big + little cores. Thread count may exceed big-core count; if it exceeds total core count, it will be scaled down to the number of cores.
  - LITE_POWER_NO_BIND No CPU binding (recommended). The system schedules work to idle CPU cores based on load.
  - LITE_POWER_RAND_HIGH Rotate binding among big cores. If the big cluster has multiple cores, it switches to the next core every 10 inferences.
  - LITE_POWER_RAND_LOW Rotate binding among little cores. If the little cluster has multiple cores, it switches to the next core every 10 inferences.
- parallelThreads {number} Parallel threads. Default 4.
- useOpenCL {boolean} Whether to use OpenCL. Default false.
Returns {OCR} A new OCR instance

Create an OCR instance with the given options. In most cases, you don’t need to customize anything; $ocr.create() is enough.

A simple example: take screenshots and detect text:

// Create OCR. You must install the official PaddleOCR plugin from the plugin store first.
let ocr = $ocr.create({
    models: 'slim', // faster but lower accuracy; omit to use default (more accurate but slower)
});

requestScreenCapture();

for (let i = 0; i < 5; i++) {
    let capture = captureScreen();

    // Detect screenshot text and measure time. The first run is usually slower.
    // The time depends on image size/content/text count.
    // You can tune threads/CPU mode via $ocr.create() options.
    let start = Date.now();
    let result = ocr.detect(capture);
    let end = Date.now();
    console.log(result);

    toastLog(`#${i + 1} detect: ${end - start}ms`);
    sleep(3000);
}

ocr.release();

OCR

Object returned by $ocr.create(), used for OCR detection. Call release() when you no longer need it to free resources.

OCR.detect(image[, options])

image {Image} Image to recognize text from.
options {object} Optional options:
- max {number} Max number of text items to return. Default 1000.
- detectRotation {boolean} Whether to detect text rotation. Default false.
- region {Array} OCR region: an array of 2 or 4 elements. (region[0], region[1]) is the top-left corner; region[2] * region[3] is width/height. If only 2 elements are provided, the region is from (region[0], region[1]) to the bottom-right corner of the image. If omitted, the region is the whole image. Added in 9.3.
Returns {Array<OCRResult>} OCR results array (confidence, text, bounds, etc.)

Run OCR on the given image with options and return results as an array.

requestScreenCapture();
sleep(1000);

let ocr = $ocr.create();

let capture = captureScreen();
let result = ocr.detect(capture);
// Iterate results and print text
result.forEach(item => {
    console.log(item.text, item.confidence);
});
// Filter results with confidence > 0.9
let filtered = result.filter(item => item.confidence > 0.9);
// Fuzzy search: find a result containing "Auto.js"
let autojs = filtered.find(item => item.text.includes("Auto.js"));
console.log(autojs);
// If found, print confidence/bounds/center and click it
if (autojs) {
    console.log(`confidence = $\{autojs.confidence}, bounds = ${autojs.bounds}, center = (${autojs.bounds.centerX()}, ${autojs.bounds.centerY()\})`);
    autojs.clickCenter();
}

ocr.release();

OCR.release()

Release OCR resources. Resources are released automatically on process exit, but you should release them as soon as you don’t need OCR.

OCRResult

An element of the array returned by $ocr.detect(). Includes confidence, text content, bounds, rotation, and rotation confidence.

OCRResult.confidence

{number}

Confidence of the recognized text, in the range ([0, 1]). Closer to 1 means more reliable.

OCRResult.text

{string}

Recognized text content.

OCRResult.bounds

{Rect}

Bounding rectangle of the recognized text in the image.

OCRResult.rotation

{number}

Rotation angle of the recognized text in degrees, in the range ([0, 360)). Typically 0 or 180. Effective only when detectRotation is true.

OCRResult.rotationConfidence

{number}

Confidence of the rotation angle, in the range ([0, 1]). Effective only when detectRotation is true.

OCRResult.javaObject

{object}

Raw Java object of the OCR result. It’s not very useful for PaddleOCR itself, but other OCR implementations may provide extra metadata such as lines, paragraphs, or word segmentation.

OCRResult.clickCenter()

Returns {boolean}

Click the center point of this result’s bounds on screen. Returns whether the click succeeded. Equivalent to click(result.bounds.centerX(), result.bounds.centerY()).