**Statement of Problem**

Given a list of transaction in the form…

- Transaction number
- Item code
- Qty
- Price

What is the probability that Item A is associated with item B in a transaction?

This question is interesting to a retailer to understand the behaviour of the shopper.

How strongly an item is associated with another item can tell the retailer the following…

- how to layout the store
- what items should be placed next to which item
- items that are presented as a logical grouping but is not bought that way
- ideas for bundling promotion
- what is the decision tree of the shopper?

This question is a subset of the data mining domain.

It is an association problem.

**Algorithm**

For each transaction, For each unique unordered pair of items calculate the sum of value (sum(qty*price)) calculate percentage to total value for the date

The percentage to total is the probability that an item A is associated with item B.

This list of unordered pairs forms the fact table for market basket analysis.

The grain is

- pair of items
- weight (sum of value)
- transaction date

This simple algorithm grows exponentially with the number of lines per transaction.

In a grocery store where a basket can have 50 items, the number of pairs is 50×49.

The performance of this algorithm is n*(n-1) where n is number of items in basket.

In a general merchandise store where number of items are not so many, this algorithm can be managed.

**One more**

Given that an item belongs to a heirarchy of product categories, what are the probability that an item or an element of the heirarchy is associated with another element of the heirarchy or item?

This query answers questions like…

- what items are associated with dental care in a typical shopper basket?
- what other categories are closely associated with the health and beauty care category?

The difference with the previous question is now we add the elements in the heirarchy in the mix.

**Algorithm**

For each transaction, For each unique unordered pair of items or elements in heirarchy calculate the sum of value (sum(qty*price))

Say we have a catalog of 8 items with a 2 level heirarchy.

Starting at the root, you have 2 level_1 heirarchy.

Next at each level_1 element we have 2 children, therefor at level_2 we have 4 elements.

Then the 8 items are the leaves of the tree.

The total number of elements is 15.

When we started at item level association, the total permutations is 8*7 = 56.

At the levels, we have 7 elements.

We remove the root as all items can be related to itself.

So we have 6.

Each of these 6 can be associated with any other 6 including itself.

This makes it 6*6 = 36.

At level_1, the 2 elements can be permutated with 4 leaves not covered by each element.

Permutations at level_1 with leaves = 2*4 = 8

At level_2, the 4 elements can be permutated with 6 leaves not covered by each element.

Permutations at level_2 with leaves = 4*6 = 24

Total permutations now = 56+36+24 = 116.

Doing an association including levels of 8 leaves in a 2 level binary tree, increases the permutations from 56 to 116.

The more levels and the bigger the basket; the number of permutations increases exponentially.

**Market Basket Transformation**

The strategy of the implementation is to have a transformation that takes line items in 1 transaction (a market basket) and produces a list of all unordered pairs from the basket.

The next step is to produce the unordered pairs of all elements of the heirarchy from the list of unordered pairs of products (leaves) from all transactions.

Then we form a job that has 3 transformations.

- Trans that generates list of transaction numbers
- Pass the result rows of previous trans to the args of the market basket trans and runs it for each row and dumps the results in a text file
- The next transformation takes the results from the text file and generates unordered list of heirarchy elements, assigns probability weight to each unordered pair and dumps it to a fact table

In the transformation that generates list of transaction numbers, we can control the grain of the market basket analysis.

If it is daily grained irrespective of stores then we generate list of transactions from all stores on that date.