Top 5 advices for users of Microsoft Cognitive Services

Introduction and Background

As the title suggests, this post is a personal recommendation for the users of Microsoft Cognitive Services, the services that provide a cloud-based subscription-based solution for artificially intelligent software applications, with an any team, any purpose and any scale commitment. We all are aware of the fact that Microsoft is investing a lot of man power, promotion and commitment in Azure nowadays and almost every of their solution is hanging around the verge of Azure and one way or the other they come back to a same conclusion, that the solution can be purchased as a Software-as-a-Service from Azure — there are many other names, Platform-as-a-Service, Service-as-a-Service, choose yours from the pool as liked.

In this post, I am going to cover up the most important points that your team should understand before migrating to the Microsoft Cognitive Services.

Background of Microsoft Cognitive Services

To any of you, who have no idea of what Microsoft Cognitive Services are: Microsoft Cognitive Services are a bundle of services, provided by Microsoft to individuals, team, and/or organizations of any size and any scale to provide services that require complex machine learning or artificially intelligent responses.

It is a tough task, to accomplish the machine learning, and with just one wrong input your entire algorithm can go to {add a slang here}. Microsoft is providing the service, where you only have to provide the inputs for the algorithm, and you get the output. Microsoft itself manages the way algorithms are going to be fine tuned, or the performance of the algorithms, you don’t worry about that.

It is a subscription based service, provided as a service in Azure now. In this post, you will know how likely is Cognitive Services service of any help to you!

Tip #0: Ask (Convince) your boss

Microsoft Cognitive Services are tested against thousands (if not millions) of users, data records and entities and the algorithms is really a concrete! You cannot meet the level where Microsoft Cognitive Services are really providing the services, the reason is that Microsoft has partnered with quite a lot of academics professors, indie developers, teams and organizations and even most of the times online surfers show up and share some data to the cloud — all of which is under a license and Microsoft asks for permission, I am not here to cover up the license terms anyways.

Get the permissions, so that we may continue on this post. 🙂

Tip #1: Take only what you need

Cognitive Services is a library of services, there are a lot of services already added to the library and many are being added every month. But that doesn’t mean you should consider all of them, or even half of them. They are all categorized under different sub sections, that contain collective services provided by Microsoft CS,

  1. Vision
    • This set of services contain face APIs, such as recognition and tracking.
    • It also provides services that can extract features from faces such as age, emotion detection.
    • It also provides computer vision, that can allow users to perform OCR functions on the images.
  2. Speech
    • Allows your users to trigger functions based on their vocal commands. — Natural language processing.
    • Speaker Recognition — bleeding-edge technology!
    • Speech to text, and text to speech services.
  3. Language
    • Allows to perform linguistic analysis of the text.
    • You can use the previous services to perform analysis on the photos as well, so that you read the text using OCR and then analyze the text.
    • LUIS (Language Understanding Intelligent Service) is the new Jarvis!
  4. Knowledge
    • Recommender systems.
    • Anything that requires complex academic, or research stuff.
  5. Search
    • The old Bing APIs are now provided here…

Likewise, you can see that these are the categories, and even these categories have different API sets and services that you might want to consume. It is up to you, to select which one you need.

Let me put this simply, if all that you need to do is, read the text from images, convert them to speech and communicate. Then all you need to purchase is, “Computer Vision API”, and “Bing Speech API”. Your application won’t need the rest of the services. LUIS can be added finally to support the communication later on.

There will be more services, and you can always add up more services. But if you are no longer using a service, or your application is not related to a service, there is no need to purchase a key for that service.

Tip #2: Keep everything in Azure

Microsoft CS are provided from different areas (all Microsoft properties), such as LUIS can be accessed through luis.ai, and vice versa. But you should keep the family tight, and keep all of the keys and resources on Azure. So that you can manage everything from a single subscription, instead of having to look at various different accounts to configure and consume the applications.

Microsoft CS supports REST-based API (and we will cover this in a later tip below), so it is very easy to add the keys to the URL and start consuming the services.

You can manage all of the keys from within Azure, just head over to the Cognitive Services blade and open up the application that you want to get the keys for. Under the “Keys” section, look for the keys that you can use to authenticate the requests.


Figure 1: List of the Cognitive Services associated with the account.

I have 4 services active, that I can access in the Azure through REST APIs. How simple that is! You can add more keys, add more services, update the keys… All from within Azure! By the end of this post, you will realize the importance of this tip.

Tip #3: Get most out of REST API

Microsoft CS Azure endpoints are provided as REST API endpoints, that you can access through any HTTP client — even a web browser. The REST API, since working on HTTP protocol, allows you to make the best use of HTTP protocol and send/receive information. Currently, Microsoft CS supports two ways of uploading the information to the cloud,

  1. URL based
  2. Binary data base

These are the two ways that you can deliver the content to Azure for processing. Apart from this, the only required header for the request is the subscription key, added to the header of “Ocp-Apim-Subscription-Key“, which is processed first and the rest of the stuff is processed later based on the subscription information.

Example

Now let me show you a little example in WPF application, of consuming the Computer Vision API to detect what the image is all about. Azure will result in a complete sentence that explains the image, and the objects in the image as well as the task being done.

The XAML code for the WPF application is as below,

<Grid>
    <Grid.ColumnDefinitions>
        <ColumnDefinition />
        <ColumnDefinition />
    </Grid.ColumnDefinitions>
    <Border BorderBrush="Black" BorderThickness="1" Width="211" Height="188">
        <Image Name="image" HorizontalAlignment="Left" Grid.Column="0" Height="188" MouseLeftButtonDown="Image_MouseLeftButtonDown" VerticalAlignment="Top" Width="211"/>
    </Border>
    <Button Name="btn" Grid.Column="0" Margin="0,0,24,10" Height="20" Width="70" Click="btn_Click" VerticalAlignment="Bottom" HorizontalAlignment="Right">Process</Button>
    <Button Name="slct" Grid.Column="0" Margin="24,0,0,10" Height="20" Width="70" Click="slct_Click" VerticalAlignment="Bottom" HorizontalAlignment="Left">Select</Button>
    <TextBlock Name="rslt" Margin="10" VerticalAlignment="Center" TextWrapping="Wrap" Grid.Column="1" Text="Result will be here..." />
</Grid>


Figure 2: WPF application running, with no image selected.

As for the backend code, the C# code was written as following,

private async void btn_Click(object sender, RoutedEventArgs e)
{
    using (var client = new HttpClient())
    {
        // Request headers
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<the-subscription-key>");

        // Request parameters
        var uri = $"https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze?visualFeatures=Description";

        // Request body
        if(fileName == null) { MessageBox.Show("Select a file first."); }
        byte[] byteData = File.ReadAllBytes(fileName);

        using (var content = new ByteArrayContent(byteData))
        {
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
            var response = await client.PostAsync(uri, content);
        }

        rslt.Text = await response.Content.ReadAsStringAsync();
    }
}

private void slct_Click(object sender, RoutedEventArgs e)
{
    OpenFileDialog dialog = new OpenFileDialog();
    if (dialog.ShowDialog() == true)
    {
        // Something happened
        fileName = dialog.FileName;
        var source = new BitmapImage(new Uri(fileName));
        image.Source = source;
    }
}

Likewise, the output of this code, once worked was,


Figure 3: Image selected and response captured from the Azure.

Like seen, this is the result, which can be mapped to a JSON object for storage or for further processing of the requests.

Tip #4: Timing is everything

Our interest in Microsoft CS is only possible if it can guarantee that we get results in a timely manner, for example if we invest Microsoft CS in the security applications, then users should be provided with results in a timely manner and the lagging may cause us to reconsider stuff around.

So, I wanted to show the time of the request as well, to demonstrate how this all works. For that, I modified the code and the following changes were applied,

Stopwatch watch = new Stopwatch();
watch.Start();
response = await client.PostAsync(uri, content);
watch.Stop();

rslt.Text = $"Request took {watch.ElapsedMilliseconds} ms to complete, for {byteData.Count()} sized byte array.\n\n";

rslt.Text += await response.Content.ReadAsStringAsync();

The effect of this was that I was able to determine how long does it take to process and return the result.


Figure 4: Application showing the time as well.

Look at the top paragraph, it says, “Request took 3519 milliseconds to complete, for 33282 sized byte array.” Which means, that to process a file of round about 30 kB it took around 3.5 seconds. There are other factors that caused the delay, such as my internet connection. Secondly, a larger image file will take more time, and a smaller image will process quickly but with errors.

There are a few things that we learn from this…

  1. The timing of the Azure is not a big factor, the factors are
    1. Our own Internet connection
    2. The image itself
  2. Type of processing to be done is important
    1. Processing a sound track of 15 seconds with low quality vs a sound track of 1 min with high quality, are never going to end up with same time.
  3. CDNs may or may not help in this case

Finally, the requirements differ from the API to API, that is why I will not talk about the recommended image size. But, you can increase the performance of the applications by uploading the files directly to Azure, because Azure is always going to download the file from the URL as well to process it. So, why not upload it directly?

Tip #5: Security

The keys for your application are really crucial. And if they are lost, or accessible to anyone, then you are responsible for what happens — in worst scenarios, they may use your own resources for their own use, and charged will be you!

Remember Tip #2, if you followed my advice, you would now be able to easily change the key if you feel someone has access to the keys.


Figure 5: Keys shown for the Microsoft Cognitive Service purchased from Azure.

Otherwise, you can use other ways to hide the keys if you don’t like updating the security keys every month. Some include, like storing the keys in secure areas, such as the Key Vault of Azure or any other place where none can access… But, what if someone does access? 🙂

In many ways, things can go wrong, thus it is my recommendation to update the keys every month. Note that you can use either Key 1 or Key 2, and you can update both the keys independent of the other.

Reminder: Just when you were reading this post, I went back and regenerated the keys… Took only 4 seconds to regenerate the both. 🙂

Final Words

I have no words, seriously. I am out of words at the moment, so, I hope you enjoyed the post. 🙂 See you next time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s