Simman's Blog

Saturday, July 21, 2012

Introduction to Pattern Recognition

A Pattern is a repeated decorative design. We, humans are very good recognizing Patterns in our every day life. We can't live without patterns. Our brain is actually classifying things based patterns and everyday we are learning patterns only. There is a huge difference between what we study and what we practice. Even when there were no colleges, humans should distinguish between friend and foe using patterns.

For example, if I show you the following diagram, what you will say

Most Probably (due to my poor drawing), you will say it is a car. But is it resembling any car parked outside. What we recognize a car is nothing but the shape, wheels, window, steering wheel and head light. Rest of the details are unimportant to us.

Let us take an example of a 6 months old toddler. He cannot talk, he cannot move and he cannot understand most things. But still he can immediately smiles at his mother. The baby can do a pattern recognition in no time. Because it is an essential part of our living. Recognizing mother is the first important thing for our survival.

There are lot of ways in which patterns can be found. They are classified into two broad categories, Supervised and Unsupervised. Recognizing mother is unsupervised. The baby forms clusters of people who are moving with him frequently and all other people are placed in a different cluster. But recognizing car is a supervised learning. Someone needs to tell the child that the picture is a car, what Papa is driving is a car etc. Upon repeatedly interacting and experiencing the child learns that it is a car.

There are many algorithms available to find the pattern. Ok, after finding the pattern, what one is going to do. The pattern can be used to do predictive analysis. For eg, I collect all the data of school kids in US and analyze them. I will also check where each kid ended up. By figuring out the correlation (we will come back to this later) between the data and where the kid ended up, I can draw a conclusion that what is the pattern of the school drop outs.

When I get current data, say the end of first semester data, by applying the above pattern, I can figure out which kids are likely to be drop out of school. I can take effective actions to address the problems of that particular kid and save him/her.

The above is an example of Supervised learning. I have the past data with grades and where the kids ended. I am finding a pattern, in other words, a mathematical formula to approximate my findings within a reasonable error limit. When a new data comes in, I can check what the output is going to be.

This technique is used in many many areas, such as credit card ratings, fraud detection, Spam filtering etc.

If I don't know the output, I can organize objects into groups whose members are similar in some way. For eg, I am observing some symptoms (headache, temperature etc.) I want to check whether the person has flu or brain tumor. Even though all the symptoms of any disease will be present in all the patients, still I can classify the people into one of the groups such as having flu, having brain tumor etc.

There are many interesting algorithms to work with Supervised and Unsupervised. I will continue the articles explaining them more.

Thursday, July 19, 2012

Predictive Analytics

Predicting something is always tricky. Most of the times the prediction may go wrong. So what to do? Blame it on Statistics. My prediction will be right 50% of the time.

When I said this to my Professor B. Yegnanarayana, he commented that to get 50% predictability you need to work on mathematical models and write huge computer programs that run for days. instead he can just flip a coin!!!!

We are getting more and more into Big Data Analytics. There are numerous articles on Big Data and it is a hot topic of 2012 like SOA sometime back.

But how good is the predictive analysis is? There are number of examples which prove that if the business is looking at the right direction, they will find the pattern to solve the problem. The credit score, fraud detection, insurance etc are some of the areas where predictive analytics is used. Refer to article http://practicalanalytics.wordpress.com/predictive-analytics-101/ for more information.

This can be done in many different ways that include machine learning, game theory and data mining.

I will start blogging about each one of the areas in the coming articles.

Wednesday, July 18, 2012

SoLoMoMe

Today I came across the interesting term "SoLoMoMe". It is a combination of Social + Local + Mobile + Personalization

This is mainly used in digital marketing.

How people are coming with these Acronyms :-)

Sunday, April 24, 2011

pseudonym in SAML 2.0

I was reading the paper on SAML 2.0 Technical Overview For a complete understaning of SAML, read the SAML Technical Overview

While describing the pseudonym it was told that two user identities can be integrated by the IdPs. For example, I have a gmail account narasimmanr@gmail.com and a yahoo account narasimmanr@yahoo.com. I want to link these two accounts. How will I do this?

I will login to one of the accounts, say yahoo.

Send a message to Yahoo to link my gmail id

Yahoo will send a mail to Gmail stating that narasimmanr@yahoo.com wants to link narasimmanr@gmail.com

Since yahoo must know that both the accounts are owned by the same individual

Gmail will send a mail to narasimmanr@gmail.com for permission

when I login to narasimmanr@gmail.com I see this mail and gives the permission.

Gmail will send the permission to Yahoo

Both accounts are linked

The happy path scenario works very well.

But the alternate path is where the doubt arises.

If narasimmanr@gmail is not owned by me. The real owner will look at it and rejects this. No problem. Gmail will send a rejection request to Yahoo and Yahoo replies back to me saying that the linking has failed.

If the other person, knowingly or unknowingly, accepts the linking. Now, I can reach out to his google apps through my Yahoo id and all other sites that trusts google. Same is applicable to him/her as well.

Question is how one can avoid this?

In the correct scenario, the delinking will also work. In the other scenario, if I delink the two accounts, will there be a confirmation from other account also? I think this is not required since once the account is linked both the IdP thinks that it is a single user.

Any thoughts on this?

Federated Authentication

In the kerberos authentication, all the parties are in the single domain. When Windows first introduced the identity management, the biggest problem for a three tier application is the business tier is unable to retrieve the user's identity since the identity will not be passed over multiple hops.

The only way to overcome is to create your own identity provider, a security component that authenticates the user and passes the token to the application.

when we are moving from a single domain to Internet where there are desparate systems, all systems should at least accept on a common protocol to communicate.

If a client requests for a service, it should show a common identity, a token, that is understandable by all the systems. This standard is the Security Assertion Markup Language called SAML.

The token must be issued by a system called Identity Provider. IdP authenticate the user and gives a SAML token.

For a complete understaning of SAML, read the article

The service in order to give the service should trust the IdP.

Instead of providing only the result of the authentication (like yes/no), the SAML token can contain some more information also. These information are called Attributes within the SAML Token. Each attribute is a name value pair. It defines the name and the value. For example, when the SAML token is passed from a University to a library it can contain the user name and which department he/she belongs to. So Department is the attribute name and say Computer Science is its value.

According to wikipedia on "Claims based authentication is the process of authenticating a user based on a set of claims about its identity contained in a trusted token. Such a token is often issued and signed by an entity that is able to authenticate the user by other means, and that is trusted by the entity doing the claims based authentication."

Thursday, April 14, 2011

Kerberos Authentication - Example Changed

Hi All

In Yesterday's blog we saw the example of Alice sending Cookies to Bob. But the analogy is not corect since here Alice becomes a server and Bob, the consumer of that service, the client. What will happen if someone else consumes the cookie etc.

We can change the example as Alice wants the cookies from Bob and only from Bob. Bob wants to give the cookies only to Alice. Bob also wants to give the cookies within a reasonable time to Alice so that orders are not duplicated.

Now go through the example. Instead of alice sending the cookies, she wants the cookies. Instead of eating the cookies, Bob will send the cookies.

Tuesday, April 12, 2011

Kerberos Authentication

I am going through the Single Sign on Concept.

Before that one need to understand the kerberos authentication which is a token based mechanism to use the services.

The Kerberos authentication involves 4 parties, the client, authentication server, ticket granting server and the application server.

The client knows only the authentication server and it wants to get the services from the application server.

The client and the application server should know that the tickets are not taken by some other middle person.

So to establish the trust and identity there are many messages which are passed between the 4 parties.

1. The user logs in to the client machine. (If the client is connected to a Active Directory then kerberos is used again).

Now assume that the user logs in to the client machine. The client and the authentication server has the same key to hash the password of the user.

2. When the user wants to use an application or when he/she wants to use the application server the client contacts the authentication server.

Authentication server is the one who establishes the identity of the user.

The client will send the user name and the application server service which he/she wants to use to the authentication server. Note that client is not sending the password to the AS.

3. The authentication server can authenticate the user and give access to the application server. But for this, it needs to maintain all the servers in the domain which is not the purpose of authentication. To separate the ownership, the Ticket Granting Server (TGS) is introduced. TGS has list of all the servers and their corresponding keys (in windows these are all security certificates).

4. The client need to communicate now with the TGS and the TGS should know that the client is already authenticated by the authentication server. Otherwise, the client can bypass the AS and come directly to the TGS.

5. When the client sends the clear text message about the user and the service, the AS checks for the existence of the user (Again note that there is no authentication since the password is not sent). AS generates a the one way hash key of the username and password.

6. AS sends two messages to the client.

Msg 1: This contains the TGS session key that is to be used by the client. Since only that particular client can use this session key, the msg is encrypted by the client secret key. The secret key is nothing but the hash key of the username and password.

Msg 2: This is used to prove the TGS that only the client authenticated by the AS is contacting it. So the server passes a message that is encrypted with the TGS secret key. This messsage contains the client ID, its network Address, period of validity and the TGS Session key itslef.

I think this is the beauty of the kerberos authentication.

Now the TGS session key can be obtained by the client from Msg 1 and by the TGS from Msg2. If the message is taken by some other client in between, it cannot decrypt Msg 1 to get the TGS session key.

7. Now the client gets the TGS session key from Msg 1.

8. Client sends two messages to TGS

Msg 3: Msg 2 + Application Server ID

Msg 4: Client ID and Time Stamp encrypted using the TGS session key.

9. Both Msg 3 and Msg4 are encrypted using the keys known only to the TGS. It decrypts both the messages. Msg3 which contains Msg2 that is encrypted with the TGS secret key can be decrypted only by TGS. From Msg2 TGS gets the information about the client.

By decrypting Msg4 it once again gets the Client ID and Timestamp.

If a client hijacks Msg2 then it can use the same msg 2 to communicate with TGS. TGS cannot verify whether msg2 is intended for that client or not. By combining these two messages, TGS can check whether the client ids are matching. If they are matching then the communicating client is the same. Also, the client can reuse the same msg2 every time, without contacting the AS. That is why the timestamp is used. With the timestamp, TGS can check whether this is within an acceptable time.

10. Now TGS does what AS did earlier. NOw it sends two messages to the client

Msg 5: The application token that consists of client ID, network address, server session key. This msg is encrypted with the server's secret key.

Msg 6: client server session key that is encrypted with client/TGS session key (remember that client TGS session key has been exchanged already)

11. Client decrypts msg6 and gets the client server session key. WIth that it sends two messages to server

Msg 5:

Msg 7: clientID + timestamp encrypted with client server session key.

12. Server first decrypts msg5 with its secret key and retrieves the client ID and the client server session key. Then it decrypts msg 7 with the client server session key. Again it compares the client IDs and if they are same, it sends the final message to the client which contains timestamp + 1. This message, as we know, is encrypted using the Client Server Session key which was securely exchanged.

13. Now the client should know that it receives the message only from the correct server.

It checks the time stamp and it is +1 from the time stamp it sends already to the server then it trusts the server.

14. Finally the client can receive the services from the server.

The whole process can be explained using a simple analogy.

Alice wants to send some cookies which may become stale after some time to Bob.

Alice and Bob don't know each other.

Alice and Bob knows Walter very well.

Walter has two locks for which Alice and Bob has the keys.

Alice, Bob and Walter had agreed to use number locks to send the items.

Alice asks Walter that she wants to send cookies to Bob.

Walter will send a box that contains the number to be set for the lock. The number, say, 123, is written in a paper, kept in a box.

The box is locked with Alice's lock. Walter is also sending another box containing the number, alice name and time when alice agreed to send the cookies.

This box is locked with Bob's lock.

Alice opens the first box, since she has the key, and gets the number. She takes the second box, put the cookies and puts both of them in a open box and sends it to Bob.

She also writes her name in a paper, the time she requested and put these details in a box. She put the number lock on this box and set the lock code which she received from Walter.

Bob receives two boxes, one open box that contain another locked box and some cookies and thee second box that has a number lock on it.

Bob doesn't know the number to open the lock yet.

Bob can eat the cookie, but he wants to make sure that the party who sent the cookie is known to Walter and the cookie is not stale. He keeps the cookie aside and opens the locked box with his key.

He gets the number for the number lock. He opens the number lock and checks whether Alice name is there in the box and the time which Alice has put in.

If both are matching with the contents in the first box, he can eat the cookie.

Otherwise the cookie is either 'poisoned' or 'stale' :-)

If he wants to send a thank you note to Alice, he can send a locked box with number he received (which Alice also knows) with the message.

Alice can open it and read it.