A nice to have thing in Google Translate

Recently, I was using this very useful service from Google, when something funny happen. The text of the site was in English, instead of that foreign language. Perfect. Then, something unexpected happened. But, let’s go back to the beginning…….

A city and a language

The town of Zadar is in Croatia. The official website is in the local language. no surprise here. In the middle of the page sit three option lists (commonly called drop boxes). Their content seems mysterious:

zadar-hr-list-1

While the content of the list remains unknown to me, the choice is available, so why not? I pick up the first element in the list. The words tip linje, could they mean line type ? Maybe. After having made the choice, the second list changes with what looks like a selection of names:

zadar-hr-list-2

Now, the third list offers  some options:

zadar-hr-list-3

All this seems to be some sort of guidance. Too pity I don’t know the language. But, hey, there is a great tool that can translate  the text into English. You know what I mean. Yes, Google, come to help!

Magic ….

Quickly, I typed the address of the site and the magic happened. The unknown language (Croatian) changed to English. Too good to be true. The images were still in the local language, but this would be too complex a problem to solve, even for the giant of the giants. Anyway, see what I’m looking for: three lists of choices. Now, the content of the first one is clear:

zadar-en-list-1

Quickly, I pick up the first option. Then comes the surprise:

zadar-en-list-2

 

The second list didn’t change at all. This is because the text on the left changed and most likely, the connection between the two of them broke. No a real stunner, but still, far from what I wanted to get.

… is unexplained science

It seems that the translation worked even at the lowest of the levels, i.e. the internal navigation system of the page. The translator performed very well. Too well to be true!  What I needed here was to have the lists of options with English text where applicable, keep the original boxes hidden, but functioning and operate the hidden levers as needed. I am confident this will be included in one of the future versions of  Google Translate.

At last ….

Then I saw a pair of small buttons, on the top menu line:

zadar-hr-en-versions

After having clicked on the right button, magic happened. The page changed into English. The option lists  continued to function.

What next

I believe the next version of the translation engines (there is more than one) should take into account the small details presented here. There are several possibilities:

  • adapt to the type of the content (e.g. take care of the list boxes and the navigation system)
  • detect the language change buttons and either present directly the  correct version of the page
  • or separate the navigation system of the page from the text content and reassemble accordingly

I am sure it is only a matter of time until the translation systems will  work flawlessly. until then, pay attention to the small details, like the buttons at the top of the page.

The missing link

You can check for yourself here.

Could there be a third bubble ?

2023-machine-learning-bubble

2001

Many remember that for 3 years,  starting with 1998 and until the burst of the dot com bubble, many new startups have been created on the basis of having just a something.com name. They were cashing in money from the banks without much thought. Of course, there were some checks, but still, they were minimal.

The bigger the promise of such a company, the more the flow of dollars that were put into into it. After all, internet was the next big thing and almost twenty years later it still is. And in most of the cases, the majority of the investors didn’t understand how most of those startups were to generate revenue.

Magic equation: 1998 + 3 = 2001 (3 = 11 in binary)

2008

A more recent event with consequences in the present happened in the late 2000s. The sub-primes were the next big thing. Nobody could explain how it worked, but this didn’t prevent massive investments, much bigger than in 2001.

The aftermath saw the demise of several respected financial institutions and the birth of regulation supposed to prevent the bubble to burst again. Until now, the changes made held on. So far, so good.

Magic equation: 2001+ 7 = 2008 (7 = 111 in binary)

2023

This scenario could never happen. After all, software is safer than ever, or it should be. A new very promising type of investment has appeared: the machine learning startups. As Steve Jobs said, it works like magic. Arthur Clarke, the author of the 2001 Odyssey, has written decades ago that any sufficiently advanced technology is indistinguishable from magic. Does it ring any bells ?

No one understands machine learning. It creates optimized mathematical structures (matrices) that find the best fit candidate among a bunch of pretenders. The mystery happens during what is called the training phase. The problem is that the aforementioned phase takes dozens if not hundreds of thousands of cycles, during which the understandable concepts convert into something that cannot be explained, but works in most of the cases. I write here most  because there are some exceptions to the rule and those exceptions may be the key to the mystery.

Magic equation: 2008+ 15 = 2023 (15 = 1111 in binary)

The equations

The three equations presented here are just educated guesses. As a rule of thumb, the time mankind needs to raise again after a fall is more than twice the time it needed in order to go down. After every bubble, the time to the next one is roughly the double. Of course, this could be just a coincidence, but what if 2023 is the time of the next bubble.

My idea is to have this article written long before that date. This will be enough time for Google to index it and if that event happens, well, it might lead to the theory of bubbles. If hopefully the bubble doesn’t happen, than the alternate scenario stays just an unverified hypothesis. Until then, there are four years.

A weakness in Google’s strength

Note: this is a series. You can start here.

google-weakness-question

In a previous article I started to analyze the search engine markets. The article ended with a promise to show what I believe is the greatest weakness of Google search. Don’t understand me wrong. The internet giant does a splendid job when it comes to guess what people want to find.

google-search-for.png

The mindreading game works well for standard content. Most of the time, it works perfectly. However, there is a type of search that does not work, at least not for now. This is what I consider a weakness in the strength of the  Palo Alto company: not being able to create original content during the query.

search-engine-expected-answer.png

For example, let’s say someone wants to query about something that is not in any article indexed by Google, at least not in its entirety. Taken as individual parts, the query’s search terms could lead to some results, but not to what the user is looking for.

One could hypotetically imagine a scenario where the search engine is building the results out of thin air. After all, people researching during a PhD are able to succeed at  the task.

phd-create-original-content

The root of the problem comes, I believe from something that is specific to the Valley culture: most of the companies have very young teams. After all, this is the basic prerequisite of a startup: break the barriers of age and have a divergent thinking. But divergent thinking from people in their 20s, while good for something innovative, it lacks maturity to a certain level. It is the paradox. One cannot be young and have a lot of experience, even if today, many entrepreneurs begin their profesional life before the university. Experience is something that somone begets in time at some critical moments.

a-i-some-sort-of-symbol

I don’t know if A.I. development  in the next 5-10 years will answer to the question of maturity level. One thing is sure: search engines still have a lot of space to conquer and from a certain point of view, they are still immature. After all, the whole SEO wars should never have happened if the systems were fully developed.

More to come in a later article ….

Google search and barriers to entry

Note: this is part of a series about Five Forces and Google search. And yes, red and green is a peculiar color combination.

google-market-share.png

Current  market division

As of early 2019, the biggest players are Google, Bing and Yahoo. Of course, the market leader is crushing the competition. Still, I believe by mid 2020s, there will be a significant change and a new leader will appear, a new player. But until, let’s give to Caesar what belongs to Caesar.

The red part is Google. The green sector is the sum of other players (Bing, Yahoo, etc). Yes, it is a monopolistic market. But there is more to that.

google-barriers-to-entry

Barrier to entry

After canibalizing on Yahoo’s market share in the early 2000s, Google errected a very high fence. Still, one must ask what is the market in which Google is playing. If you believe Google is the no.1  search engine in all the information markets, I have to disagree. There are three types of market and the Valley giant is playing only in one of them. It is the biggest, yes, but that could change in the future.

The markets

It is my belief that there are at least three types of information markets:

  • the old and public market
  • the very fresh and unknown market
  • the hidden market
  • the robots.txt market

Obviously, an information broker like Google has  specialized in analyzing the first type of market. So far, it seems it bests anyone on it.

As for the second type of market, the information is too new to go up in the natural referencing system. so, in a way, it is relevant, yet ignored. This is  due to the major weakness of Google and I will explain it later.

The third market is not accessible to any of the top 3  search engines. The root cause is becaue of the cost. Either one has to pay for the information, or the data is physically protected and never available to the general audience.

A particular category of markets are those controlled by the robots.txt files. Normally, a search engine is not supposed to look there. I write normally, because no one knows how polite and respectful a search engine is.

google-last-word

One last word

I do believe that the three markets where Google has no control might develop in a spectacular way. Do you see an weakness in the strategy of the internet giant ? If not, wait for my next article.

Come back later for more about the 5F+1 table on google search…. Yes, there are weaknesses and threats.

Five Forces: Google search

Google’s golden elephant: advertising + information provider. Yes, I know, they are strong in many other areas, but this article will focus on the search engine, the historical icon of the internet.

5f-google-magnifying-glass

Google search provides information to billions (!) of people.  Don’t have time to go to the library ? Google it.  Look for something ? Google knows.  So good has become the engine that people use it as a second brain. Some say that the global IQ of the planet went down because of Google. Really ? Let’s have a closer look.

5f-google-hammer-nail

Hammer, fingers, nail

It is a mistake to blame the tool. It’s like letting the hammer fall on the finger instead of the nail. Google search is a tool, a very sofisticated workhorse, but that is all. You like it or not, it is there, ready to be used. And most of the time it is used. By most, I mean something like 99%.

5f-google-blueprints-car

Drawing made with the mouse. Arghh!

Now, a search engine is as good as the information it provides.  Looking for the blueprints of the latest Tesla car  (Model Y) ? No, they are not  public and no search engine can provide you with that information. This is wher the limits of the search engine are.

5F-google-fresh-old.png

The search engine has to make trade-offs and decide which 10 to 20 websites are the most relevant. They are the first results to be shown. In general, fresh news become yesterday’s news in a matter of minutes. However, if we leave out the buz about the Kardashians and Nadal, the news become more stable.

This will be continued in a future article ….

Kerlink – financial statements

financial-statements-ren-on-blue

Before jumping into the water

I have started the analysis of the publicly available financial reports of Kerlink, the IoT player. I tried to use here a color scheme as closed to the official web page as possible. Before going  further, I would like to remark the choice of colors. Finance requires a strong  focus on figures. It is not the first time I see this red text on blue background. From my experience, sooner or later, someone will see that it is a bit hard to read the text.

 

Speech recognition and the real world

percent-5-1

The margin of error is shrinking down

Introduction

A recent article  from GeekWire caught my attention. It seems that a Microsoft, a pioneer in speech recognition, reached a record error rate. In one year, this rate has fallen from 5.9% to 5.1%. It seems impressive. IBM has announced an improvement of their speech recognition engine, too, down from 6.9% to 5.5%. Alexa from Amazon is also improving. Siri from Apple gets better than ever. The same for Google. Competition is healthy because it drives innovation and paves the way to breakthroughs. Yet, today, everyone is using the same magic. Could it be the wrong magic ?

speech-recognition

An artificial neural network

The magic under the hood

Today, some, if not all of the speech engines use what is called a neural network.  Basically, the machine tries to imitate the human brain. And the misconception in neural networks is the following: there are 100 billion neurons in the human brain, each with 100 to 10000 connections. Those connections are extremely important to the human intelligence. So, by the numbers,  there are between 10 trillion and  1 quadrillion connections.

A big number, but after all, just a number. All we need is to get to have 1 quadrillion processors or something equivalent and the system will be as smart as a human. Well, something has been omitted here. Yes, there are so many connections in the human brain, but the part that is considered intelligence has much less ‘smart material’.  If the human brain is a ball of 16 centimeters in diameter, the ‘intelligent’ part of it is an outer layer less than  3 mm thick. It the cerebral cortex. The rest of the brain is the animal part. Somehow, the intelligent layer of the brain has a quality that makes us smarter.

one-of-twenty

Only 5% of the words matter here

 

The real challenge

IBM claims that one word out of 20 is missed by a human listener. While I don’t agree with the claim, one fact is sure: people speak differently:

  • different speeds;
  • different volumes;
  • different vocabularies;
  • different pronunciations

and so on. All these differences adds up to the challenge of understanding speech. The English language  has about 1 million words. 5% of a million is 50000 words.  As many as the common vocabulary of a common speaker.  Imagine 20 people in the countryside. Only one of them knows how to get to the castle of the king, 19 others leaking information that misleads. According to the current state of the art, no speech recognition can guarantee to bring you to the king. And if such a system were to be part of a self-driving car, well, I don’t even try to imagine.

six-sigma

The true challenge is to get to Six-Sigma

The true breakthrough

A good speech system should be much more close to Six-Sigma and the reason why is that is should be able to infer what word it missed, make correct guess and ask clarifying questions. For those who are not aware, Six-Sigma is about 3.4 errors in a million.

Don’t misunderstand me. 5% is a great improvement. I remember when 20 years ago I used Microsoft’s experimental speech recognition system and each time I spoke ‘iexplore’ it understood ‘Netscape’. Yes, such was the case.  Today it has changed, but 5% is not good enough for me.  Not if I want to put the system in a place where people’s life depends on it.

The potential of IoT

While I am still a bit skeptical, there is a huge potential for speech recognition. By embedding Alexa or Siri into a small device like a temperature controller, or a water tap controller,  we could interact in a more humane way with our environment. so there is hope. A new hope.

So keep working Microsoft, IBM, Amazon, Google and all other teams. The road is not a pleasant walk, but by the end of it, there such a big reward …

 

Can Microsoft become a key IoT player ?

A recent survey on Twitter produced the following results:

msft-iot

It seems Samsung is perceived as a more important IoT player than Microsoft. Also, one third of the respondents believe that Microsoft will abandon the IoT arena. However, when Apple launched the iPad, Microsoft tried to come on the market with a similar product. After a quick failure, the company from Redmond understood its mistakes, took its time and brought Surface to the market.

Related imageThe question is why Samsung is so strong and why it is perceived as so strong.  After all, the Korean giant  is best known as a mass producer, not a as a pioneer. IoT is a technology based on diversity. This contrasts with mass production.

Related image

My guess is that Microsoft is seen as an underdog. This could be a mistake from Samsung. The West coast IT company is known to make mistakes, but also recognized as a street fighter. And street fighters are good at wining matches.

Five Forces: Twitter’s Advertising Business Division

Twitter’s main businesses :advertising + data provider

This analysis focuses on advertising

 

Force#5 : New Entrants
Barrier level : high
Force#3 : Suppliers Force#1 : Competition Force#2 : Buyers
Direct known suppliers : opinion leaders, companies worried about their own usersIndirect unknown suppliers : people in a place at a specific time

Direct unknown suppliers : re-tweeters

Status : Twitter is the preferred mean of sharing information for people who own a smartphone. WiFi and data connectivity are of essence.

Direct known competitors: Google, Facebook, Linkedin, MySayDirect unknown competitors: Press / News agencies

Indirect known competition : other micro blogging

Status: very strong competition

Direct known buyers: Brands, Marketers who run frequent campaigns

Indirect known buyers : brands and agencies that get re-tweeted

Status: Being #1 gives twitter bargaining power, but there are alternatives (competition)

Force#4 : Substitutes Force#6 : Complements
Known substitutes : RSS feeds, billboards

Unknown substitutes : Group SMS messaging

Status : Fading substitutes

Little known about

Note: green = healthy, orange = possible threat, red = vulnerable