## POWER BI: Modeling

#### Calculated Columns and Measures

##### Intro to Calculated Columns and Measures

The basics of modeling and the power behind Power BI involves creating new calculated columns and measures. Though sometimes used interchangeably, they get or calculate the data in very different ways. Anytime you do data analysis, you should be questioning whether the obtained result makes sense. If it does not than you most likely should have applied a measure instead of a calculated column or vice versa. There are many articles out there on when to use calculated columns and measures.

To summarize briefly: A calculated column is computed row by row in a table. When you create a calculated column the data lives inside your table. It is calculated row by row. When you like to use this data as a filter or slicer to segment some other data then you should use a calculated column. A calculated column is often used as an intermediate step to obtain a final result. When you want to determine when a particular transaction meets a specific requirment, use a calculated column.
A measure is the data that gets sliced or filtered. A measure does not live in your dataset, rather it calculated on the fly when called upon. How is this done? A measure aggregates all the rows in the data and returns the result. A measure changes based on the context of the data and is usually the final value you want to report on. When you want to find total sales amount or ROI use a measure.

##### Using Calculated Columns

A calculated column is a new column that is created in POWER BI from two or more existing columns. It is calculated row by row in the data table. A calulated column is often used in conjugation with slicers to filter or segment your data for analysis. That is we create a calculated column to act as a filter that we can later use. Another reason to create a calculated column is if you like to have the data stored in the dataset. For instance, say you a column that shows the value of cost. A new calcuated column, "Cost after Taxes" could be made using the original "Cost" column.

Example: Add a calcuated column to account for costs after tax. In the formula bar write:

                         Cost after Taxes = TransactionData[Cost]*(1.07)

• TransactionData is the table used
• Cost is the column in that table. A column name must always be enclosed in square brackets

In the example above, each row in the Cost column was evaluated and mutliplied by the corresponding tax rate of 7%. Since this is a calculated column you can see each result row by row in data view found under manage relationships.
This will not change the original data set (ie. the data set you are using). In order to do that, you need to make changes in the query editor itself.

Example: Define a calculated column to check for large transactions. In the formula bar write:

                      Large Transactions = IF(TransactionData[Revenue] > 10000,
"revenue from large transactions","revenue from all other transactions")


This will check row by row for each transaction to see if it was greater than 10,000. If so the first result: "revenue from large transactions" will be the result of that row.

##### Using Measures

A measure is sometimes incorrectly confused or used interchangeablly with a calculated column. However, unlike calculated columns, a measure is not stored in the table and therefore should NOT be used as a filter. In a measure, calculations are made on the fly by aggregating all the rows in a column and returning the result whenever called upon. Anytime you need to compute ratios, such as AOV (average order value) or ROI (return on investment) you should be creating a new measure. Using a calculated column in such cases will give you the wrong results. In summary, if you want to find the calculation for a value and report that number, you should be creating a measure.

Example: Writing your first measure. In the formula bar write:

                      Measure_for_Cost = SUM(TransactionData[Cost])


This creates the simplest possible measure above. We are summing up ( aggregarting ) all the cost rows in our data. In fact, Power BI will automatically do this for us with all columns having the following symbol:

The measure created above Measure_for_Cost will be identical to the Cost column in every possible context. Why create a measure then? If we have cost data and revenue data, but are missing Profit or ROI then we can use these columns to create a new measure. We can then visually display these new measures in Power BI and filter them if needed using calculated columns or other dimensions.

For example using the divide function:

                         ROI = DIVIDE(

SUM(TransactionData[Total Profit]),
SUM(TransactionData[Cost (total)]),
0
)


##### Two Illustrative Examples
###### Example 1

A calculated column and measure was created using two identical formulas:

                        Calculated Column Revenue = SUM(TransactionData[Revenue])

                        Actual Revenue = SUM(TransactionData[Revenue])


Yet there outputs were quite different when I plot the results on a bar graph. Can you figure out why? In this sample data, the actual total revenue for all the data is about $67.6 million. The calculated column revenue shows a staggering 4 trillion in revenue! (note that is still much less than the US National Debt.) If you go into "data view" you will see the results of "Calculated Column Revenue". It gave you what you asked, the sum of all transcations. In this case each row is a static value of 67.6 million. However, when you go into "Report View" and plot the result, it returns an aggregation of all the rows with$67.6 million. With this sample data set there is 60,516 rows of data. In otherwords, $67.6 million x 60,516 ≈ 4T. ###### Example 2 Below outlines an example of when you should be using measures. The final value or result you want to look at should be defined as a measure. When calculating ratios, measures need to be used. In this case, the reason to use measures is becauses a measure will aggregrate the total data while calculated columns do not. For example see the three sales records below: Customer Name Total Orders Sales Amount AOV Anna 1$1 $1 Bob 2$5 $2.5 Cat 1$4 $4 Here we can see that the total sales amount for all of our Candy Store customers is $$1+5+4 = 10 The row by row average order value is also computed. Bob has two orders (He came to the candy store this week and last week, then never came back), while Anna and Cat each came to the candy store once. In this case the true AOV is:$$\frac{\sum sales}{\sum orders} = \frac{1 + 5 + 4}{1 + 2 + 1} = \$2.5 $$If you try to compute this in POWER BI using a calculated column it will actually return:$${\sum{\frac{sales}{orders}}} = 1 + 2.5 + 4 = \$7.5 $$Try for yourself! Taking an average of the AOV is also not right. In otherwords:$$\frac{\sum metric1}{\sum metric2} \neq \sum{\frac{metric1}{metric2}}$$##### Summary of Difference Between Calculated Column and Measure Calculated Columns Measures Uses up more memory since it is stored in a table Calculated on the fly. Uses more CPU therefore may be slower. Used with slicers to filter and segment data Used to calculate the final numeric value Row by row calculation is involved for all rows in the table All rows in the column aggregaged and returned. No individial row by row calculation SUM(metric1/metric2) =$\sum{\dfrac{metric1}{metric2}}$SUM(metric1)/SUM(metric2) =$\dfrac{\sum metric1}{\sum metric2}\$

Up Next: The CALCULATE function.

• SEO