Name	Name	Last commit message	Last commit date
Latest commit History 26 Commits
image	image
image_utils	image_utils
model	model
predict	predict
.gitignore	.gitignore
.travis.yml	.travis.yml
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
all_config.py	all_config.py
image.png	image.png
rcnn_predict.py	rcnn_predict.py
rcnn_train.py	rcnn_train.py
requirements.txt	requirements.txt
vision_server.py	vision_server.py

Vision-ml

See also Vision-ui a series algorithms for mobile UI testing.

A R-CNN (Region-based Convolutional Neural Networks) machine learning model for handling pop-up window in mobile apps.

Mobile UI Recognition

Vision-ml is a machine learning model that identifies the UI element that closes the Pop-up window and return its UI coordinate (x, y) on the screen.

A typical usage scenario would be:

In mobile testing, when using Appium or similar framework for UI automation, it is usually very tricky to locate the components on the Pop-up window which is rendered on top of the current screen.
Input a mobile App screenshot with the Pop-up, and you will get the predicted result (as shown in the blue box).

1	2	3

Requirements

Python3.6.x

# create venv before install requirements pip install -r requirements.txt

Usage

You can use Vision with a pre-trained model in "model/trained_model_1.h5", the number in the file name is for version control, you can update it in file named "all_config".

There are two ways of using Vision.

Predict an image with Python script

Update your file path in "rcnn_predict.py"

model_predict("path/to/image.png", view=True)

Run script and you will get the result

python rcnn_predict.py

Predict an image with a web server

Start the web server

You can create server with Dockerfile

python vision_server.py

Post image to web server

curl http://localhost:9092/client/vision -F "file=@${IMAGE_PATH}.png"

The response from the web server will have the coordinate or the UI element, alone with a value of score 0 or 1.0 (0 means not found, 1.0 means found).

{ "code": 0, "data": { "position": [ 618, 1763 ], "score": 1.0 } }

Train your own model

You can choose to use your train image if your close button has different feature from the given training set. Just take a screenshot and put it in the "image/" folder
Rename the train image with prefix "1" for close button and "0" for background.
You can refer to the given training images in the repo for examples.

Button image named 1_1.png:

Background image named 0_3.png:

There are some images in this repo for training.

0_0.png 0_1.png 0_2.png 0_3.png 0_4.png 0_5.png 0_6.png 1_0.png 1_1.png 1_2.png 1_3.png 1_4.png 1_5.png 1_6.png

Get augmentation of your image, run method in "rcnn_train.py"

Image().get_augmentation()

Train your image, run method in "rcnn_train.py"

train_model()

Model layers and params

Model layers

Model input image will be processed from 3d to 1d and pixel value will be set to 255 or 0, which makes the model has great classification.
There are 5 layers and the 196,450 parameters in total, which is a light model and makes the training easier and the model is also robust for different Pop-up windows.

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 48, 48, 32) 320 _________________________________________________________________ conv2d_2 (Conv2D) (None, 46, 46, 32) 9248 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 23, 23, 32) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 21, 21, 64) 18496 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 10, 10, 64) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 8, 8, 64) 36928 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 4, 4, 64) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 4, 4, 64) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 1024) 0 _________________________________________________________________ dense_1 (Dense) (None, 128) 131200 _________________________________________________________________ dense_2 (Dense) (None, 2) 258 ================================================================= Total params: 196,450 Trainable params: 196,450 Non-trainable params: 0

Training params

In all_config.py we have training params of batch_size and epochs

batch_size is the number of training image for updating the model
epochs is the number for training the model with all the training image.

Performance

With CPU of corei7@2.2Ghz:

Training a model takes 30s with 10 epochs
A mobile Pop-up window screen shot with 1080p takes 10s for calculating.

Reference

The R-CNN model refers to this paper.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meituan-Dianping/vision-ml

Folders and files

Latest commit

History

Repository files navigation

Vision-ml

Mobile UI Recognition

Requirements

Usage

Predict an image with Python script

Predict an image with a web server

Train your own model

Model layers and params

Model layers

Training params

Performance

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision-ml

Mobile UI Recognition

Requirements

Usage

Predict an image with Python script

Predict an image with a web server

Train your own model

Model layers and params

Model layers

Training params

Performance

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages