Machine learning is currently gaining popularity. Since training models is labor-intensive, its application in Android is considered with reluctance. However, Google has made this task easier for us with the introduction of ML Kit. In this article, we will briefly look at the capabilities of ML Kit and write an application that recognizes text in an image with further copying to the clipboard.
Briefly about the features of ML Kit
This is a mobile SDK from Google that allows you to use machine learning for Android and IOS.
For beginners in machine learning, Google provides ready-made models, and for experts to create custom ones using Tensorflow Lite.
ML Kit allows us to work with the text(Vision Api):
1. To recognize the text
2. Face detection
3. Scan barcodes
4. Layout image
5. Identify and track objects
6. Recognize landscapes
You can also work with NLP(Native Language Processing):
1. Language detection
2. The translation of the text
3. Generate simple responses to messages
In addition, you can create your own custom models, train them yourself, and use them in your app.
Text recognition (Implementation)
The task is given: determine the text from the photo(we will take photos from the gallery), and then copy this text to the clipboard.
Before you start, connect Firebase to the project(if you haven’t done This before).
Throwing markup in activity_main
<ImageView android:id="@id/image_holder" android:layout_width="match_parent" android:layout_height="match_parent" android:scaleType="centerCrop" android:fitsSystemWindows="true" app:layout_collapseMode="parallax" android:contentDescription="@string/image_for_recognition" /> //... <TextView android:id="@+id/detected_text_view" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_gravity="center|top" android:textAlignment="center" /> //... <com.google.android.material.floatingactionbutton.FloatingActionButton android:id="@+id/choose_image_from_gallery_btn" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_margin="@dimen/activity_horizontal_margin" android:src="@drawable/ic_gallery_24dp" app:layout_anchor="@id/main.appbar" app:layout_anchorGravity="bottom|right|end" app:rippleColor="@color/colorSecondary" />
image_holder – туда мы будем помещать изображение из галереи
image_holder-where we will place the image from the gallery
detected_text_view-text recognized from an image
Throwing markup in bottom_sheet
<Button android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_weight="0.5" android:text="@string/text_button" android:onClick="recognizeText" android:textAllCaps="false" style="@style/RoundedCornerButton" /> <Button android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_weight="0.5" android:text="@string/copy_button" android:onClick="copyText" android:textAllCaps="false" style="@style/RoundedCornerButton" />
Adding a dependency to the project for text recognition
implementation 'com.google.firebase:firebase-ml-vision:24.0.3'
Specify the metadata in the manifest in <application> so that the models are loaded in Yandex. market when you click the “Install” button in Google Play, otherwise you will have to load them in the background later.
<meta-data android:name="com.google.firebase.ml.vision.DEPENDENCIES" android:value="text" />
Open MainActivity.kt and add an event listener for choose_image_from_gallery_btn and insert the image into our image_holder
choose_image_from_gallery_btn.setOnClickListener { if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M){ if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) == PackageManager.PERMISSION_DENIED){ val permissions = arrayOf(Manifest.permission.READ_EXTERNAL_STORAGE) requestPermissions(permissions, PERMISSION_CODE) } else { pickImageFromGallery() } } } private fun pickImageFromGallery() { val intent = Intent(Intent.ACTION_PICK) intent.type = "image/*" startActivityForResult(intent, IMAGE_PICK_CODE) } override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) { super.onActivityResult(requestCode, resultCode, data) if (resultCode == RESULT_OK && requestCode == IMAGE_PICK_CODE){ image_holder.setImageURI(data?.data) } }
We got the image, now we need to recognize the text and output it to our detected_text_view.
fun recognizeText(view: View) { if (image_holder.drawable == null){ Toast.makeText(this, R.string.image_null, Toast.LENGTH_LONG).show() return } recognizeTextFromDevice() } private fun recognizeTextFromDevice() { val detector = FirebaseVision.getInstance().onDeviceTextRecognizer // Getting the state of FirebaseVisionTextRecognizer val textImage = FirebaseVisionImage.fromBitmap((image_holder.drawable as BitmapDrawable).bitmap) detector.processImage(textImage) .addOnSuccessListener { firebaseVisionText -> detected_text_view.text = TextProcessor.getInstance().process(firebaseVisionText) //Processing the received text } .addOnFailureListener { // Handling error } }
As you may have noticed, we recognize the image on the device, because on the cloud we would have to process the network connection, the Internet on the device, and in addition we would have to create a billing account in Firebase Console(for free), but we can’t recognize the Russian language on the device(at least I could not).
Let’s create a singleton TextProcessor class, but first let’s understand the theory.
Each text (FirebaseVisionText) consists of blocks(TextBlock), they in turn are made up of lines, and lines are made up of elements(Element), that is, symbols.
In the example below, blocks are shown in red, lines are shown in blue, and elements are shown in purple
Creating the process () method. I decided to write this class in Java
public String process(FirebaseVisionText firebaseVisionText) { StringBuilder resultText = new StringBuilder(); for (FirebaseVisionText.TextBlock block: firebaseVisionText.getTextBlocks()) { for (FirebaseVisionText.Line line: block.getLines()) { resultText.append(line.getText()).append("\n"); } resultText.append("\n"); } return resultText.toString().trim(); }
Here, the text received after each line was translated to the next line, and after each block again, so that the end of each “paragraph”was visible.
It remains to add a copy of the received text to the clipboard
private fun copyTextToClipBoard() { val clipboardService = getSystemService(Context.CLIPBOARD_SERVICE) val clipboardManager: ClipboardManager = clipboardService as ClipboardManager val srcText: String = detected_text_view.text.toString() val clipData = ClipData.newPlainText("Source Text", srcText) clipboardManager.setPrimaryClip(clipData) }
You can use this method when you click on detected_text_view or when you click a separate button.
Checking the app
Small conclusion
We created a simple app for recognizing text from photos and added the ability to copy it to the clipboard using a small amount of code. You can improve the app by adding saving the results of previous recognized texts or cropping photos.
Library documentation: ML Kit
App in Google Play: Text Line Recognizer
Project github: Hawoline