Machine learning is currently gaining popularity. Since training models is labor-intensive, its application in Android is considered with reluctance. However, Google has made this task easier for us with the introduction of ML Kit. In this article, we will briefly look at the capabilities of ML Kit and write an application that recognizes text in an image with further copying to the clipboard.
Briefly about the features of ML Kit
This is a mobile SDK from Google that allows you to use machine learning for Android and IOS.
For beginners in machine learning, Google provides ready-made models, and for experts to create custom ones using Tensorflow Lite.
ML Kit allows us to work with the text(Vision Api):
1. To recognize the text
2. Face detection
3. Scan barcodes
4. Layout image
5. Identify and track objects
6. Recognize landscapes
You can also work with NLP(Native Language Processing):
1. Language detection
2. The translation of the text
3. Generate simple responses to messages
In addition, you can create your own custom models, train them yourself, and use them in your app.
Text recognition (Implementation)
The task is given: determine the text from the photo(we will take photos from the gallery), and then copy this text to the clipboard.
Specify the metadata in the manifest in <application> so that the models are loaded in Yandex. market when you click the “Install” button in Google Play, otherwise you will have to load them in the background later.
Open MainActivity.kt and add an event listener for choose_image_from_gallery_btn and insert the image into our image_holder
choose_image_from_gallery_btn.setOnClickListener {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M){
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) ==
PackageManager.PERMISSION_DENIED){
val permissions = arrayOf(Manifest.permission.READ_EXTERNAL_STORAGE)
requestPermissions(permissions, PERMISSION_CODE)
} else {
pickImageFromGallery()
}
}
}
private fun pickImageFromGallery() {
val intent = Intent(Intent.ACTION_PICK)
intent.type = "image/*"
startActivityForResult(intent, IMAGE_PICK_CODE)
}
override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
super.onActivityResult(requestCode, resultCode, data)
if (resultCode == RESULT_OK && requestCode == IMAGE_PICK_CODE){
image_holder.setImageURI(data?.data)
}
}
We got the image, now we need to recognize the text and output it to our detected_text_view.
fun recognizeText(view: View) {
if (image_holder.drawable == null){
Toast.makeText(this, R.string.image_null, Toast.LENGTH_LONG).show()
return
}
recognizeTextFromDevice()
}
private fun recognizeTextFromDevice() {
val detector = FirebaseVision.getInstance().onDeviceTextRecognizer // Getting the state of FirebaseVisionTextRecognizer
val textImage = FirebaseVisionImage.fromBitmap((image_holder.drawable as BitmapDrawable).bitmap)
detector.processImage(textImage)
.addOnSuccessListener { firebaseVisionText ->
detected_text_view.text = TextProcessor.getInstance().process(firebaseVisionText) //Processing the received text
}
.addOnFailureListener {
// Handling error
}
}
As you may have noticed, we recognize the image on the device, because on the cloud we would have to process the network connection, the Internet on the device, and in addition we would have to create a billing account in Firebase Console(for free), but we can’t recognize the Russian language on the device(at least I could not).
Let’s create a singleton TextProcessor class, but first let’s understand the theory.
Each text (FirebaseVisionText) consists of blocks(TextBlock), they in turn are made up of lines, and lines are made up of elements(Element), that is, symbols.
In the example below, blocks are shown in red, lines are shown in blue, and elements are shown in purple
Creating the process () method. I decided to write this class in Java
public String process(FirebaseVisionText firebaseVisionText) {
StringBuilder resultText = new StringBuilder();
for (FirebaseVisionText.TextBlock block: firebaseVisionText.getTextBlocks()) {
for (FirebaseVisionText.Line line: block.getLines()) {
resultText.append(line.getText()).append("\n");
}
resultText.append("\n");
}
return resultText.toString().trim();
}
Here, the text received after each line was translated to the next line, and after each block again, so that the end of each “paragraph”was visible.
It remains to add a copy of the received text to the clipboard
private fun copyTextToClipBoard() {
val clipboardService = getSystemService(Context.CLIPBOARD_SERVICE)
val clipboardManager: ClipboardManager = clipboardService as ClipboardManager
val srcText: String = detected_text_view.text.toString()
val clipData = ClipData.newPlainText("Source Text", srcText)
clipboardManager.setPrimaryClip(clipData)
}
You can use this method when you click on detected_text_view or when you click a separate button.
Checking the app
Small conclusion
We created a simple app for recognizing text from photos and added the ability to copy it to the clipboard using a small amount of code. You can improve the app by adding saving the results of previous recognized texts or cropping photos.