thirdPartyPlugins - Third-party plugins

老猫10/22/22About 7 min

Third-party plugins

This page lists Auto.js plugins developed by third-party developers.

Pytorch plugin

This plugin and documentation are developed and provided by the third-party developer Haoran. Special thanks.

To use this module, install the Pytorch-AjPlugin extension plugin, then load it with let pytorch = $plugins.load("com.hraps.pytorch").

The Pytorch module runs trained deep-learning neural network models on Android devices. It can implement features that are hard to achieve with conventional programs, such as image recognition, translation, and Q&A.

Before use, make sure you have a trained neural network model, and convert the model file into a mobile-friendly TorchScript file using Python, e.g. torch.jit.trace(model, input_tensor).

If you are not familiar with Pytorch, the author (Haoran) recommends learning here: Dive into DL (PyTorch).

You can test device support with pytorch.debugOd() and preview the recognition quality for object detection.

Exporting Pytorch weights

You need to convert .pt to .torchscript.pt to run on mobile. Use tracing to convert the model into a mobile-usable form. Models that rely on third-party libraries may not work well; it is recommended to write and train models in pure Pytorch.

Python conversion script on desktop:

model = make_model('resnet18') # build model structure
model.load_state_dict(torch.load(model_pt)) # load trained weights
model.eval() # set eval mode (required)
input_tensor = torch.rand(1,3,224,224) # input format: 224x224 RGB(3), random tensor
mobile = torch.jit.trace(model, input_tensor,strict=False) # trace to TorchScript
mobile = optimize_for_mobile(mobile) # optional mobile optimization
mobile.save(mobile_pt) # save file

Basic functions

pytorch.load(path[,device])

path {String} Model path
device {int} Execution device: 0 CPU, 1 VULKAN. Default 0.
Returns {PytorchModule}

Load a neural network model. Only a small number of devices support VULKAN; you can omit device.

When path is "yolov5s", it loads the built-in object detection model. When path is "textcnn", it loads the built-in sentiment analysis model.

pytoch.forward(module,input)

module {PytorchModule} Loaded neural network module
input {Tensor} Input tensor
Returns {Tensor} Output tensor

Run forward inference and return the output tensor.

pytoch.forwardTuple(module,input)

module {PytorchModule} Loaded neural network module
input {Tensor} Input tensor
Returns {Tensor} Output tensor

Run forward inference and return the output tensor.

Compared to pytorch.forward(), this returns the first item of a tuple output, suitable for models like object detection. Internally it is essentially module.forward(IValue.from(inputTensor)).toTuple()[0].toTensor().

pytorch.destory(module)

module {PytorchModule} Neural network module to release

Release the neural network module.

Tensor class

Tensor is the common input/output data structure for neural networks. It is a high-dimensional array that enables fast processing. For example, an image of 100×200 with 3 RGB channels is converted to a float array of length 100×200×3 before feeding into the model.

pytorch.fromBlob(arr,shape)

arr {List} JS array
shape {List} Tensor shape
Returns {Tensor} Constructed tensor

Construct a tensor from a JS array.

tensor.getDataAsFloatArray()

Returns {List} Float array converted from the tensor

Convert the tensor to a float array.

tensor.getDataAs[Byte/Double/Float/Int/Long]Array()

Returns {List<…>} Array converted from the tensor

Convert the tensor to arrays of various primitive types.

ObjectDetection - object detection functions

Object detection analyzes the location and category of items in an image. If you are unsure, use pytorch.debugOd() to preview the effect.

pytorch.debugOd([modulePath,classPath])

modulePath {String} Model file path (.pt/.pth)
classPath {String} Class labels file path (.txt)

Test an object detection model file and open the built-in debug page.

Call with no args to use the built-in OD weights and test device support, i.e. pytorch.debugOd().

pytorch.liveOd(modulePath,classPath)

modulePath {String} Model file path (.pt/.pth)
classPath {String} Class labels file path (.txt)

Enter the camera real-time recognition page to view streaming results.

ObjectDetection (OD) - commonly used functions

Common helper functions for building your own object detection pipeline.

pytorch.getIOU(a,b)

a {Rect} Rect A
b {Rect} Rect B
Returns {float} IoU value

Compute the overlap ratio (IoU) of two rectangles.

pytorch.bitmapToTensor(img[,mean,std])

img {Bitmap} Source image (Bitmap)
mean {List} Normalization mean. Default [0.485, 0.456, 0.406]
std {List} Normalization std. Default [0.229, 0.224, 0.225]
Returns {Tensor} Tensor converted from the image

Convert an image to a Tensor for model input. The image must be a Bitmap. If you have an Auto.js Image, use image.getBitmap() to convert.

mean and std are used to normalize image colors. Use the same values as during training. If unsure, you can set mean [0,0,0] and std [1,1,1].

img = images.read("/sdcard/a.jpg");
inputTensor = pytorch.bitmapToTensor(img.getBitmap(),[0,0,0],[1,1,1]);

pytorch.floatsToResults(floats,row,column,imgScaleX,imgScaleY[,threshold])

floats {List} Output array from a YOLO model
row {int} Number of rows (detections)
column {int} Data width per detection
imgScaleX {float} Image scale factor on X
imgScaleY {float} Image scale factor on Y
threshold {float} Minimum confidence threshold to keep
Returns {List} All converted results

Convert YOLO output array to a results list. floats is the full output data; floats.length should equal row * column.

If you resized the image to a fixed input size for the network, you can use imgScaleX/imgScaleY to map result coordinates back to the original image.

YOLO model output is composed of detection cells. Each detection consists of confidence, box location, and per-class probabilities. For example, in the built-in YOLO model, each output contains 4 box values (x,y,w,h), 1 confidence value, and 80 class probabilities, so column = 4 + 1 + 80 = 85. row is the number of detection cells.

pytorch.useNMS(results[,limit,threshold])

results {List} All results
limit {int} Max remaining results limit
threshold {float} Box overlap threshold
Returns {List} Results after NMS

Filter duplicate results. NMS (Non-Max Suppression) removes boxes whose overlap is higher than threshold.

OdResult class

Represents a single object detection result, containing rect, score, and classId.

odResuult.score

score {float} Confidence score

odResult.rect

rect {Rect} Bounding box. See the images module docs for Rect.

odResult.classIndex

classIndex {int} Class index

NaturalLanguageProcessing (NLP) functions

Provides helper functions for NLP models.

pytorch.debugTec([modulePath,vocabPath])

modulePath {String} Model file path (.pt/.pth)
vocabPath {String} Vocabulary ids file path (.txt)

Test a TextCNN sentiment analysis model file and open the built-in debug page.

Call with no args to use the built-in sentiment weights and test device support, i.e. pytorch.debugTEC().

pytorch.simplifySentence(sentence)

sentence {String} Input sentence
Returns {String} Simplified sentence

Simplify an English sentence: keep only letters and digits, and convert to lowercase.

Vocab class

Vocabulary class that provides efficient conversion between words and ids.

pytorch.vocabPath(path)

path {String} Vocabulary file path
Returns {Vocab} Vocab instance

The file should contain one word per line. The line number ↔ word mapping must match the word embedding / vocab used during training.

pytorch.vocab(words)

words {List} Word list
Returns {Vocab} Vocab instance

vocab.size()

Returns {long} Vocab size

Get the number of words in the vocab.

vocab.getWord(id)

id {long} Word id
Returns {String} Word text

Get the word for the given id.

vocab.getId(word)

word {String} Word text
Returns {long} Word id

Get the id for the given word.

vocab.getWords(ids[,length])

ids {List} Word ids
length {int} Returned list length
Returns {List} Word texts

Get the list of words for multiple ids.

vocab.getIds(words[,length])

words {List} Word texts
length {int} Returned list length; pad missing items with 0
Returns {List} Word ids

Get the list of ids for multiple words.

Examples

Image object detection (YOLOv5s)

/**
 * Pytorch plugin - object detection (YOLOv5) example
 * 
 * Author: Haoran (Q:2125764918)
 */



// Quick demo (visual debug page):
/*
pytorch = $plugins.load("com.hraps.pytorch")
pytorch.debugOd()
exit()
*/


// Load plugin module
pytorch = $plugins.load("com.hraps.pytorch")

// Load neural network model (built-in YOLOv5s)
var model = pytorch.load("yolov5s")
// Load class names (string array). You can hardcode it, e.g. ["car","plane","person"...]
var classes = pytorch.getCocoClasses()

// Define model input width/height. Input shape: w*h*3
var inputWidth = 640
var inputHeight = 640
// Define output shape: row*column
// row depends on input size (grid count)
var outputRow = 25200
// column = box(x,y,w,h)=4 + score=1 + class probs(80 for COCO) => 85
var outputColumn = 4 + 1 + 80

// Load input image
var img = images.read("/sdcard/DCIM/Camera/b.jpg")
// Resize to model input size
var inputImg = images.resize(img, [inputWidth, inputHeight])
// Convert image to tensor. Use mean=[0,0,0], std=[1,1,1] (no special normalization)
inputTensor = pytorch.bitmapToTensor(inputImg.getBitmap(), [0, 0, 0], [1, 1, 1])
// Run inference to get output tensor
output = pytorch.forwardTuple(model, inputTensor)
// Tensor -> float array
f = output.getDataAsFloatArray()
log("Output length: " + f.length)

// Scale factors
imgScaleX = img.getWidth() / inputWidth
imgScaleY = img.getHeight() / inputHeight
// Convert outputs to results (map back to original image coordinates)
results = pytorch.floatsToResults(f, outputRow, outputColumn, imgScaleX, imgScaleY)
log("Initial detections: " + results.size())
// Apply NMS to remove duplicates
nmsResults = pytorch.useNMS(results)
toastLog("Final detections: " + nmsResults.size())
// Iterate results
for (var i = 0; i < nmsResults.size(); i++) {
    result = nmsResults.get(i)
    rect = result.rect
    str = "Class: " + classes.get(result.classIndex) + "  Score: " + result.score +
        "  Box: left=" + rect.left + " top=" + rect.top + " right=" + rect.right + " bottom=" + rect.bottom;
    log(str)
}

Sentiment analysis (TextCNN)

/**
 * Pytorch plugin - sentiment analysis (TextCNN) example
 * 
 * Author: Haoran (Q:2125764918)
 */


// Quick demo:
/*
pytorch = $plugins.load("com.hraps.pytorch")
pytorch.debugTec()
exit()
*/


// Input sentence
var text = "The program is useful!"

// Load plugin module
pytorch = $plugins.load("com.hraps.pytorch")
// Load neural network model (built-in textcnn)
var model = pytorch.load("textcnn")
// Load vocab. One word per line. Line number should match the ids used in training (built-in vocab used here)
var vocab = pytorch.getTextcnnVocab()
log("Vocab loaded. Size=" + vocab.size());

// Simplify sentence: remove punctuation, lowercase
var textSimple = pytorch.simplifySentence(text)
log("Simplified: " + textSimple);
// Model input length
var inputSize = 128
// Convert words to ids. Unknown words and padding are filled with 0
var ids = vocab.getIds(textSimple.split(" "), inputSize)
log(ids)
// Build Tensor for model input
var inputTensor = pytorch.fromBlob(ids, [1, 128])
// Run inference
var outPutTensor = pytorch.forward(model, inputTensor)
// Output tensor -> float array
var result = outPutTensor.getDataAsFloatArray()
log("Model output: " + result[0] + "   " + result[1])
// Interpret result
console.info("Sentence: " + text)
if (result[0] <= result[1]) {
    console.info("Result: positive")
} else {
    console.info("Result: negative")
}

OCR plugin

This plugin and documentation are developed and provided by the third-party developer Haoran. Special thanks.

This module recognizes text in images (see examples at the end). It is implemented with a deep-learning pipeline: DbNet + AngleNet + CrnnNet.

Download: LanZou Cloud

Recognition

ocr.detect(img[,ratio])

img {Bitmap} Image to recognize (Bitmap). For Auto.js Image, call .getBitmap() to convert.
ratio {float} Scale ratio, default 1. Use a smaller value for small images if needed.
Returns {List} List of results. See OcrResult below.

OCR recognition. The first call initializes automatically. See examples below.

Filtering

ocr.filterScore(results,dbnetScore,angleScore,crnnScore)

results {List} Results list to filter
dbnetScore {float} Minimum confidence for “this region is text”
angleScore {float} Minimum confidence for angle classification
crnnScore {float} Minimum average confidence for text recognition
Returns {List} Filtered results list

Filters low-confidence results. Use 0 for a threshold to disable that filter.

OcrResult class

Each instance represents one OCR result item.

ocrResult.text

{String} Recognized text

ocrResult.frame

{List} Result polygon coordinates

The result is an arbitrary quadrilateral. It returns an integer list of length 8:
[x1,y1,x2,y2,x3,y3,x4,y4] for the four vertices.

ocrResult.angleType

{int} Text angle type

ocrResult.dbnetScore

{float} Confidence for the region being text

ocrResult.angleScore

{float} Confidence for the angle type

ocrResult.crnnScore

{List} Per-character confidence list

Example

// Load plugin
ocr = $plugins.load("com.hraps.ocr")
// Load the image to recognize (set your own path)
img = images.read("./test.jpg")
// OCR
results = ocr.detect(img.getBitmap(),1)
console.info("Results before filter: " + results.size())
// Filter
results = ocr.filterScore(results,0.5,0.5,0.5)
// Print results
for(var i=0;i<results.size();i++){
    var re = results.get(i)
    log("Result:" + i + "  Text:" + re.text + "  Frame:" + re.frame + "  AngleType:" + re.angleType)
    log("Region score:" + re.dbScore + "  Angle score:" + re.angleScore + "  Text score:" + re.crnnScore + "\n")
}