Azure AI: Using the Azure AI Vision to Read Text from an Image

In this article, we will understand the Azure AI Vision to extract textual data from the image. Artificial Intelligence (AI) has now immerged as one of the mandatory components of the new generation software applications. AI has wide set of user stories across various domains like Banking, Finance, Healthcare, Manufacturing, etc. AI offers wide range of features those are useful from enterprise application users like online shopping, banking, etc. to the common users like on social media, etc. Nowadays, most of the end-users are looking for simply to use but yet powerful applications for various types of purposes e.g. directly reading the textual contents from the image or extracting phone number from the business card, etc. There are several AI services providers available those who keep providing such services and the Azure AI is one of the bests amongst all.

Azure AI

Azure AI is a comprehensive platform provided by Microsoft for building, deploying, and managing AI solutions. There are several services offered by the Azure AI platform and one of the Service is Computer Vision now known as Azure AI Vision.

Azure AI Vison

This offers a range of advanced capabilities for analyzing visual content. Here are some of its key features:

Image Analysis:

Extracts visual features from images, such as objects, faces, and auto-generated text descriptions.
This is very much effective when the application directly needs to extract textual data from image, identify faces as well as objects based on its shape to detect type of object its tagging, and location.

Optical Character Recognition (OCR):

Extracts printed and handwritten text from images, supporting various languages and writing styles.
This is typically useful when the application wants to process large amount of data from the printed documents e.g. invoices. I have already published an article on this link to read about the Invoice Processing.

Facial Recognition:

Detects, recognizes, and analyzes human faces for applications like identification and touchless access control.

Video Analysis:

Includes features like spatial analysis to track movement and video retrieval for creating searchable video indexes.

In this article, we will implement the use of Azure AI vision to extract text from the image. To perform this operation, we will be using the Azure.AI.Vision SDK. This is a very powerful service that offers advanced computer vision capabilities. The Azure AI Vision service gives you access to advanced algorithms that process images and return information based on the visual features you're interested in. Azure AI Vision can power many digital asset management (DAM) scenarios. The DAM is the business process for organizing, storing, and retrieving rich media assets and managing digital rights and permissions to manage Logos, faces, objects, and so on. Azure AI Vision can analyze images that meet the requirements as follows:

The image must be provided to the application in JPEG, PNG, GIF, or BMP format
The file size of the image must be less than 4 MB
The dimensions of the image must be greater than 50 x 50 pixels
To use for the Read API, the image dimensions must be between 50 x 50 and 10,000 x 10,000 pixels.

To analyze an image, we will be using Azure AI Vision ImageAnalysis. This feature provides powerful tools for extracting information from images. This offers following capabilities:

Visual Feature Extraction
Optical Character Recognition
Brand Detection
Content Moderation

To complete the code in this article, the Azure Subscription is needed, and the Azure AI Vision must be created as shown in Figure 1:

Figure 1: Create Azure AI Service

Once you click on the Azure AI Services, new page will be opened where you can choose the Computer vision option as shown in Figure 2

Figure 2: The Computer Vision Service

To create the Computer Vision service, click on the Create link, by entering details as for various options as shown in Figure 3, the service can be created

Figure 3: Creating Computer Vision Service

Once the service is created, to use it in the client application, we need to authenticate the client application for the service using the key. To copy the key and the service endpoint, in the Resource Management of the service copy the Key and Endpoint from the Keys and Endpoint option as shown in Figure 4

Figure 4: Keys and Endpoint

So now, we have the service ready, it is a time for use to create the client application. I have created the client application using the Blazor WebAssembly app.

Step 1: Open Visual Studio and create a new Blazor WebAssembly application. Name this application as Blazor_AI_ComputerViosn_ImageTableReader. In this application add package for Azure AI named Azure.AI.Vision.ImageAnalysis.

Step 2: In this project, add a new JSON file in wwwroot folder named config.json. In this JSON file we will be adding keys for Vision Service Keys and Endpoint that we have copied as shown in Figure 4. The Listing 1 shows the config.json file.

{
  "AzureComputerVisionAIServiceEndpoint": "[AZURE-VISION-SERVICE-ENDPOINT]",
  "AzureComputerVisionAIServiceKey": "[AZURE-VISION-SERVICE-KEY]"
}

Listing 1: config.json

Modify Program.cs to read the config.json file by adding the code as shown in Listing 2. This code must be added before the RunAsync() method call.

using var http = new HttpClient { BaseAddress = new Uri(builder.HostEnvironment.BaseAddress) };
using var response = await http.GetAsync("config.json");
using var stream = await response.Content.ReadAsStreamAsync();
builder.Configuration.AddJsonStream(stream);

Listing 2: Code for reading config.json file

As shown in Listing 2, the config.json file is read as a stream using the HttpClient object.

Step 3: In the application add a new folder named AIProcessor. In this folder, add a new class file named AIImageProcessor.cs. In this file we will add code for AIImageProcessor class. This class will be constructor injected using IConfiguration interface. This will be used to read config.json file with its keys. This class will be having a method named UploadAndProcessImageAsync(). This method has the following specifications:

Accepts the Stream object. This object represents the upload file using the HTTP request.
This method reads keys from the config.json so that the Blazor application will be authenticated against Azure AI Vision Service.
To access the service, the method uses the ImageAnalysisClient class. The ImageAnalysisClient is part of the Azure AI services, specifically within the Computer Vision suite. It provides powerful AI algorithms for processing images and extracting information about their visual features. This class has a constructor that accepts the endpoint URI and the AzureCredential class object as an input parameter. The URI is used to connect to the VISION AI Service and AzureCredential object uses the VISION AI Service key to authenticate with the service. The class has following capabilities:

Image Captioning
Object Detection
Optical Character Recognition
Content Tagging

The AnalysisAsync() method of the ImageAnalysisClient class accepts the image stream as a BinaryData and VisualFeatures as input parameters. The BinaryData object uses the FromStream() method to convert image stream as a BinaryData. The VisualFeatures is an enum that specifies the visual features to be used for analysis of the image. The VisualFeatures enum in Azure AI Vision includes several options for analyzing images, such as:

Dense Captions: Generates human-readable captions for different regions in the image.
Objects: Detects physical objects in the image and returns their locations.
People: Identifies and locates people in the image.
Read: Extracts printed or handwritten text from the image. For the current article we will be using this value.
Smart Crops: Suggests optimal cropping regions for the image.
Tags: Provides tags that describe the content of the image

The AnalysisAsync() method returns the ImageAnalysisResult type. This has the Value property, that further has the Read property of the type ReadResult that is used for extracting the printed as well as Hand-Written text. The ReadResult class has the property named Blocks of the type DetectedTextBlocks. This class represents a single block of text detected from the image. This class has the Lines property of the type DetectedTextLine that represents a single of text detected from the image.
The UploadAndProcessImageAsync() methos returns a string of texts extracted from the image.

The code of the UploadAndProcessImageAsync() method is shown in Listing 3.

using Azure.AI.Vision;
using Azure.AI.Vision.ImageAnalysis;
namespace Blazor_AI_ComputerViosn_ImageTableReader.AIProcessor
{
    /// <summary>
    /// Class for Processing the Uploaded Image
    /// </summary>
    public class AIImageProcessor(IConfiguration configuration)
    {
        private readonly IConfiguration _configuration = configuration;
        public async Task<string> UploadAndProcessImageAsync(Stream stream)
        {
            string strText = string.Empty;
            var key = _configuration["AzureComputerVisionAIServiceKey"];
            var endpoint = _configuration["AzureComputerVisionAIServiceEndpoint"];

            if (string.IsNullOrEmpty(key) || string.IsNullOrEmpty(endpoint))
            {
                throw new ArgumentNullException("Azure Computer Vision AI Service key or endpoint is not configured properly.");
            }

            var client = new ImageAnalysisClient(new Uri(endpoint), new Azure.AzureKeyCredential(key));

            var result = await client.AnalyzeAsync(BinaryData.FromStream(stream), VisualFeatures.Read);

            foreach (DetectedTextBlock block in result.Value.Read.Blocks)
            {
                foreach (DetectedTextLine line in block.Lines)
                {
                    strText += line.Text + Environment.NewLine;
                }
            }
            return strText;
        }

    }
}

Listing 3: The UploadAndProcessImageAsync() method

As shown in Listing 3, the UploadAndProcessImageAsync() method returns string that is the text extracted from the image.

Step 4: In the Pages folder of the project add a new Razor Component named ImageTableReader. In this component add the code as shown in Listing 4,

@page "/imagetablereader"
@using AIProcessor
@inject HttpClient Http
@inject IConfiguration Configuration
@inject AIImageProcessor ImageProcessor

@code {
   
    public List<string> lstData { get; set; } = new List<string>();
    public string ImageText { get; set; } = string.Empty;
    private string? imageDataUrl;

    private async Task OnInputFileChange(InputFileChangeEventArgs e)
    {
        var file = e.File;
        var buffer = new byte[file.Size];

        var resizedImageFile = await file.RequestImageFileAsync("image/png", 300, 500);
        await resizedImageFile.OpenReadStream().ReadAsync(buffer);
        imageDataUrl = $"data:image/png;base64,{Convert.ToBase64String(buffer)}";

        var memoryStream = new MemoryStream();
        await file.OpenReadStream().CopyToAsync(memoryStream);
        memoryStream.Position = 0;
        ImageText = await ImageProcessor.UploadAndProcessImageAsync(memoryStream);
    }
    
}

Listing 4: The Razor Component code

The code in Listing 4, shows that the OnInputFileChange() method, accepts the file and process this file. The file is read as stream and then converted to the buffer so that its URL can be created. Further, the file is put into the MemoryStream and then this stream is passed to the UploadAndProcessImageAsync() method of the AIImageProcessor class. This class is injected in the component along with the IConfiguration interface.

Modify the component by adding HTML UI as shown in Listing 5

<h3>Image Table Reader</h3>
<h4>
    Upload the Image to read Text from it
</h4>

<div class="container alt altwarning">
    <div class="row">
        <div class="col-md-6">
            
            <InputFile OnChange="OnInputFileChange" />
            @if (!string.IsNullOrEmpty(imageDataUrl))
            {
                <img src="@imageDataUrl" alt="Uploaded Image" style="height:400px;width:1000px;" />
            }
        </div>
    </div>
    <br/>
    <div class="alert alert-warning">
        <InputTextArea @bind-Value="ImageText" style="height:400px;width:1000px;overflow:scroll"></InputTextArea>
    </div>
</div>

Listing 5: The HTML UI for the component

The UI uses the InputFile Blazor component, this is the standard component to choose the local file. This component is event bound with the OnInputFileChange() method to choose Image to upload. The img tag shows the uploaded image. The InputTextArea shows the text extracted from the image.

Finally, register the AIImageProcessor class in the dependency container in the Program.cs as shown in Listing 6

.........

builder.Services.AddScoped<AIImageProcessor>();

.........

Listing 6: The Dependency Registration of AIImageProcessor class

Also, modify the NavMenu.razor from the Layout folder to register the ImageTableReader component in the route as shown in Listing 7

 <div class="nav-item px-3">
     <NavLink class="nav-link" href="imagetablereader">
         <span class="bi bi-list-nested-nav-menu" aria-hidden="true"></span> Process Image
     </NavLink>
 </div>

Listing 7: The Route for ImageTableReader component

Run the Application, and in the browser click on the ImageTableReader link, the component will be loaded in the browser as shown in Figure 5.

Figure 5: The ImageTableReader Component loaded

Choose the Image file, its contents will be read as well as the file will be loaded in the image as shown in Figure 6

Figure 6: The Image and extracted contents from it

Thats it.

The code for this article can be downloaded from this link.

Conclusion: The Azure AI Vision is one of the most powerful services to extract contents from images and process it.

Search This Blog

Technology Wonders

Azure AI: Using the Azure AI Vision to Read Text from an Image

Popular posts from this blog

Uploading Excel File to ASP.NET Core 6 application to save data from Excel to SQL Server Database

ASP.NET Core 6: Downloading Files from the Server

ASP.NET Core 8: Creating Custom Authentication Handler for Authenticating Users for the Minimal APIs