Real-time Analytics and Power BI

Real-time Analytics and Power BI

Real-time Analytics is a way of analysing the data as soon as it’s generated. Data is processed as it arrives and the business gets insights delivered without any delay.

Real-time Analytics is useful when you are looking to build analytics and reporting that you need to respond to quickly. It’s how we ensure the analysis is updated with the latest available data, when that data updates constantly. It’s particularly useful in what we class at Oxi Analytics as “sense and respond” analytical use cases.

These sense and respond use cases are usually found where small but quick changes in a process will make a significant impact on the result, where risks are trying to be minimised or where we need to quickly identify changing patterns to avert serious damage to an area of our business, such as sudden and unexpected changes in customer behaviour. They are found across various business verticals and industry sectors, particularly in brand monitoring, digital marketing, manufacturing etc.

Power BI’s real-time analytics features are used by organisations across the world such as TransAlta, Piraeus Bank S.A, Intelliscape.io etc., You can read more about these use cases here.

Power BI delivers real-time Analytics capabilities with its real-time streaming features. Let’s explore this more to learn about its capabilities in depth and also importantly, it’s limitations.

Real-time Streaming in Power BI

Real-time streaming allows you to stream data and update dashboards in real-time. Any visual or dashboard that makes use of a real-time streaming dataset in Power BI can display and update real-time data.

Types of Real-time datasets

There are three types of Power BI real-time streaming datasets designed for displaying real-time data:

  1. Push datasets
  2. Streaming datasets
  3. PubNub streaming datasets

Only the Push dataset allows historical data to be stored. If we want to build real-time analytical reports which show the historic data as well as the latest changes, we need to use the Push dataset. The other two datasets, Streaming datasets and PubNub streaming data sets are used when we want to create dashboard tiles to showcase only the latest data point.

Here’s a table listing the major differences between all three datasets. You can find more information here.

Capability

Push

Streaming

PubNub

Update Dashboard tiles in real-time

Yes.

 Allowed with visuals created via reports and then pinned to dashboard

Yes.

Allowed for custom streaming tiles added directly to the dashboard

Yes.

For custom streaming tiles added directly to the dashboard

Data stored permanently in Power BI for historic analysis

Yes

No, it’s only stored temporarily for an hour.

No

Ability to build reports atop the data

Yes

No

No

 

Push Dataset

This is a special case of a streaming dataset. While creating a streaming dataset in Power BI, if the ‘Historic data Analysis’ option is enabled, it results in a Push dataset. Once this dataset is created, the Power BI service automatically creates a new database to store the data.

Reports can be built on these datasets like any other dataset. Power BI doesn’t allow for any transformations to be performed on this dataset and it cannot be combined with other data sources either. However, it allows for adding measures to the existing table. Data can also be deleted using the REST API call.

New Streaming Dataset
New Streaming Dataset

Streaming Dataset

Data gets pushed here as well but there’s an important difference. Power BI stores the data in a temporary cache, which quickly expires. The temporary cache can only be used to display visuals, which have some transient sense of history, such as a line chart that has a time window of one hour.

Since there’s no underlying database created, you cannot build reports using this data. Also, you cannot make use of report functionality such as filtering, custom visuals etc.

The only way to visualise this data is by creating a dashboard and adding a tile with “Custom Streaming Data” under the Real-Time Data section.

Real time data
Real time data

PubNub Dataset

With this dataset, the Power BI web client uses the PubNub SDK to read an existing PubNub data stream. No data is stored by the Power BI service. PubNub is a third-party data service.

As with a streaming dataset, there is no underlying database in Power BI, so you cannot build report visuals against the data that flows in, and cannot take advantage of the other report functionalities. It can only be visualised by adding a tile to the dashboard, and configuring a PubNub data stream as the source.

Any web or mobile application which uses the PubNub platform for real-time data streaming could be used here.

Streaming dataset
Streaming dataset

In general, when using a custom streaming dashboard tile, you can choose from five different visualisation types as shown in the below screenshot. These tiles, when added to the dashboard, will have a lightning bolt icon at the top left corner, indicating that they are real-time data tiles.

Streaming dataset

How to Choose a Dataset Type?

A Push dataset can be used when historic data analysis and building reports atop the dataset are crucial. The dataset can be created using the API option in the streaming dataset UI. It can be connected using either Power BI service or the Power BI Desktop.

A Push dataset can also be created using the ‘Azure Stream’ option. However, the dataset created using this method can only store a maximum of 200,000 rows. After hitting the limit, rows are dropped in a FIFO (first-in, first-out) fashion.

If the idea is to have dashboard tiles displaying pre-aggregated live data using simple visuals, then a streaming dataset is the perfect choice. This can only be connected using the Power BI service.

PubNub datasets are used when the data is generated using the PubNub data stream. Tiles created using this dataset are optimized for displaying real-time data with very little latency.

I hope that helped in answering some of the questions on real-time analytics using Power BI. Please feel free to contact me through contact@oxianalytics.xyz if you have any further questions I can help with.

Digital Strategy and Analytics in Pharmaceuticals

Digital Strategy and Analytics in Pharmaceuticals

One reason for the success of big Pharma is superior prowess in sales and marketing. Once R&D (Research & Development) has delivered a new patented innovation with global market potential, the marketing and sales activities promote rapid global adoption and shorten the time to peak sales within the patent life.

Big Pharma has long dominated the available sales and marketing channels. This position of strategic positional advantage has been stable, kept in place by a combination of conservative regulation and big Pharma’s satisfaction with the status quo.

Current trends are threatening this stability. So, how might things change? In this blog I will explore how this instability could disrupt the status quo and create a new basis for competition. All players would be wise to take account of instability and the new possibilities that could open-up.

For example, Covid 19 has propelled more innovation in digital practices across healthcare in the last three months than in the previous three years. At the same time, Covid 19 has disrupted Pharma’s promotion model. Polling of healthcare professionals suggests that there is every prospect that these practices and the disruption will persist. One possibility that opens-up is that digital strategies could become increasingly important.

Healthcare professionals say they need help with digital innovation and new digital practices post Covid. The need for digital re-skilling and learning across the healthcare sector is significant. The healthcare professions need help with funding, digital skills, and digital tools.

Pharma has ample skilled professionals with empathy for the caring professions and that are skilled in communicating new concepts and practices as well as in building relationships. Pharma could respond to these new needs and new opportunities by re-deploying its resources to help everyone transition to the new faster-moving, real-time digital world. Pharma resources could switch to communicating and helping prescribing health professionals via digital conversations on digital channels.

Regulators would need to play their part. They have long constrained the ways that Pharma firms communicate and build relationships with healthcare professionals. However, regulators are interested in promoting consumer and patient welfare, healthcare innovation, and competition. All of which could be improved by a move to using digital channels. Regulators are always open to discussion and negotiation. They have and will look favourably on new digital practices that help patients, healthcare practitioners and pharmaceutical companies if those practices are open and competitive.

Regulations differ by legal jurisdiction. For this reason, firms have long delegated responsibility for local relationships from the global to the local market level. Locally is where the digital conversations are handled. Local and regional is where discussions with regulators are conducted. If they choose to take the initiative, big Pharma could choose to orchestrate those local and regional regulatory discussions now.

New digital channels will be harder for big Pharma to dominate by any other means than superior marketing and relationship building competence. Digital channels will allow smaller players, with a compelling story for a niche audience to cut through and get their message across. They will obtain insights that they use both to shape their messaging and target unmet needs. While individually these incursions into the larger firms’ market will be small, collectively they will add up.

Local market teams in big Pharma will be taking on their smaller local market competitors head to head on a newly-level playing field. Local marketing teams will engage their local medical professionals digitally on topics that are locally relevant. They will use local digital marketing applications that support digital conversations, information sharing and relationship building.

This scenario would be a competitive marketplace, but there is one sustainable advantage that big Pharma could create by using data and analytics. This is a global database that could be a rich source of insights into healthcare professionals’ evolving priorities. Used for new insights this could be a long-term competitive advantage in customer understanding and segmentation.

To create this intelligence advantage, big Pharma firms could merge the local data obtained from local conversations captured by digital analytics applications with data from providers, payors, patient advocacy groups, web-based medical sites, third-party data brokers, channel partners and open data from government and inter-governmental sources. Once collected, structured, and mined for insights it could restore a source of strategic advantage from global scale and reach in marketing and sales to big Pharma. Technically these databases are not difficult to create using scalable cloud technologies such as Microsoft Azure and other cloud vendors in combination with enterprise-grade visual data analytics tools.

Digital Strategy as a Core Competence

Digital could become an important competence for all firms that wish to engage with healthcare professionals. Firms will keep competency high only with continuous improvement processes and continuous learning – ideally based on digital analytics. This is the way they will adapt to the new era of Covid resilient business practices. It could open big Pharma up locally to competition from digital-first niche competitors. While smaller businesses will not have the resources of the large pharmaceutical companies, digital is perfectly suited to niche businesses and local data and analysis is equally available to all players. Firms of whatever size can craft a digital strategy which is right for them, their healthcare system, and their patients.

Conclusion

It is likely that the recent disruptive changes mean that all players in the pharmaceutical and healthcare sector will need a new strategy for digital. Large firms have both the advantages and the challenges of scale, but small players can easily access the affordable services and the data and analytics they need to compete in niche markets. Digital strategies, widely adopted, could supply the impetus for innovation and productivity across the healthcare sector.

Power BI, Google Sheets and BigQuery – Do they work together?

Power BI, Google Sheets and BigQuery – Do they work together?

Hi Again! I’m back with another post on BigQuery, Power BI and Google Sheets. Today, we’ll be looking at how to connect Power BI and Google Sheets.

Ideally, I’d want to create a DirectQuery connection between Power BI and Google Sheets, because the data in my Google spreadsheet changes frequently, and we need the latest data to be reflected in the Power BI report. Unfortunately, Power BI doesn’t currently support a DirectQuery connection to Google Sheets.

This is where BigQuery will help us out. We’ll try to use BigQuery as an intermediary to connect Power BI to Google Sheets and access the data we need in the way we need it. We’ll need to explore creating a DirectQuery connection between Power BI and an external table on BigQuery.

What is a Direct Query Connection?

Power BI offers different ways to connect to its data sources – Import, DirectQuery and Live connection. A DirectQuery Connection is when the data isn’t stored in the Power BI Server but in the data source itself. Every time you refresh the report, it queries the data from the source and gets it.

Power BI recommends that it’s best to use the ‘Import’ connection whenever possible. However, if we have data changes frequently and reports that must reflect the latest data, then DirectQuery fits best. Performance might take a small hit, but it becomes essential when dealing with huge datasets which can’t be imported to Power BI.

Data sources which allow for direct connections are Google BigQuery, SQL Server, Azure Data Lake etc., You can find more information about Power BI supported data sources here.

What is an External table?

Google BigQuery has two ways in which it stores its data:

  1. Native Table – this is where the data gets imported and is loaded into BigQuery or a table is created in BigQuery and data is inserted. This consumes storage on BQ
  2. External Table – this is where the data doesn’t get imported but stays in the source. This table is linked to the data source and it gets the data every time the associated BigQuery table is queried. This doesn’t consume storage on BQ

Any data which is stored on Google Drive can only be accessed as an external table on BigQuery and this goes for Google sheets as well.

Time for some Action!

I have a Google spreadsheet named ‘Test_data’ on my Google Drive. I’ll load this spreadsheet to BigQuery using the web console.

I’ll presume that you have a BigQuery account, a project and a dataset to use for this exercise. Pick a dataset and click on the ‘Create Table’ option. You’ll then be presented with a screen like below. Fill in the details and your BQ external table is ready.

Power BI, Google Sheets and BigQuery
Power BI, Google Sheets and BigQuery

Now it’s time to test the DirectQuery connection in Power BI. So, open Power BI and click on the ‘Get Data’ option. Select the ‘Google Big Query’ and click ‘Connect’. In the resulting window, navigate to the dataset containing the ‘Test_data’ table

Well, that’s strange! I had 4 tables in my dataset but I can only view 3 of them. This is because the Power BI Google BigQuery Connector doesn’t support External BigQuery tables. It only supports the Native BigQuery tables. Since Test_data is an external table, it doesn’t get listed here.

This information is not mentioned in the Power BI documentation for Google BigQuery either. So, we’ll have to look at alternative ways to get that data.

Is there a workaround?

Yes. We’ll have to convert our External table into a Native table on BigQuery so that Power BI can access it. Let’s look at how to do get this done.

Creating a new BigQuery table

This can be done on the BQ web console with a DDL statement. Here it goes

CREATE TABLE `project_name.dataset_name.Test_data_NEW`

AS SELECT * FROM `project_NAME.DATASET_NAME.TEST_DATA`

The above statement creates a new table named ‘Test_data_new’ and copies the data from ‘Test_data’ table onto it. Now we have a Native table in BigQuery containing the google spreadsheet data.

It’s also possible to write a scheduled query and update the Native table on a recurring basis once it has been created. Note that the minimum time that BigQuery allows you to refresh a table is at 15 minutes. So, with this solution, your report gets updated data every 15 minutes. You could use the below query to get started.

INSERT `project_name.dataset_name.Test_data_NEW`(Name, AGE, LOCATION)

SELECT * FROM `project_NAME.DATASET_NAME.TEST_DATA`

Connecting to Power BI

Let’s retrace our steps. Open Power BI and click on the ‘Get Data’ option. Search for Google BigQuery and select it. Once it opens up, navigate to your new table (Test_data_new) and click on the Load button.

Power BI, Google Sheets and BigQuery

You’ll then be presented with a Connecting settings window, where you need to select the DirectQuery option and click OK.

Power BI, Google Sheets and BigQuery

There you go! I can finally access my Google spreadsheets data in Power BI and the report gets updated data every 15 minutes. Now, that is a perfect solution to this problem. Please feel free to ask any questions by contacting me through the contact box!

Cloud Services: How to Choose between Azure and AWS

Cloud Services: How to Choose between Azure and AWS

Cloud services are now the default choice for all firms. The economies of scale available to the cloud service providers make this the most efficient way to provide storage and cloud computing capacity. Customers now pay for services based on capacity and usage and capacity can be scaled as required.

Management of cloud services costs can now be based on business value judgements. What is the business value of the data we are buying, storing and processing and is it greater than the costs? Cost control is a simple matter of rationalising, or archiving, unused data and checking that any processing adds value. This business transparency simplifies the management of IT costs, not least by making them variable, traceable and comparable across business units.

There are three big cloud services platforms provided respectively by Microsoft, Amazon and Google which have a complete set of service offerings, other platforms with partial service offerings such as Salesforce and Adobe and smaller players that offer only infrastructure as a service. In many service areas “platforms” are being built by acquisition to broaden a core service as the platform strategy is adopted more widely to lock in business customers. This means that any choice of an individual technology is also a platform choice that could have longer term implications.

Amazon Web Services and Microsoft Azure are at the heart of the Amazon and Microsoft platforms respectively, offering services in computing, delivery, storage, database, developer tools, and other functionality.

What is Azure?

Microsoft Azure was launched in 2010. Azure data analytics is a key offering along with solutions for web and mobile applications that can integrate seamlessly with the enterprise environment to achieve efficiency.

Pros

  • High-end testing capabilities. There are testing tools for mobile apps, digital marketing, IoT devices and business apps.
  • Azure has a private cloud computing environment and does not share access to third-party cloud platforms. This feature makes it an excellent choice as platform-as-a service.
  • Azure has autoscaling options ensuring an Azure developer has the capacity required.
  • As a Microsoft offering, it integrates optimally with other Microsoft products such as Windows Server, Active Directory and so on.

Cons

  • Azure documentation isn’t very easy to navigate and sometimes incomplete or not updated. This is often listed as a complaint in Azure forums.
  • Azure’s integration with platforms other than Microsoft is limited.

What is AWS?

AWS was also launched in 2010 and provides tiered cloud development platforms for usage for content delivery and computation. AWS offers several hundred platforms that cover a wide range of solutions in storage, networking, and delivery, with hundreds of templates in several programming languages.

Pros

  • Enterprise friendly, scalable services that offer great flexibility for high-end computing. Subscribers with the pay as you go feature are able to purchase only the required functionality.
  • Covers almost all the cloud services an organization may need. There are several application development platforms that offer dedicated solutions with a unique hardware setup for each.
  • A high level of security is offered in the cloud applications with strict management policies, configuration settings and multiple encryption layers.
  • Autoscaling ensure AWS developers have the resources they require.

Cons

  • The hybrid structure of AWS can be a setback for certain cloud computing solutions as they integrate private and third-party public clouds frameworks.
  • Complexity: An extensive list of services and high-quality resource offerings means AWS developers require a more elaborate knowledge of the environment.

Comparative study of Azure and AWS

The table below compares the key features of Azure and AWS and highlights some key differences.

  Azure AWS
Computation Virtual Hard Disk that is equivalent to a Machine instance, is used for computation through virtual machines. The user, Microsoft or third-party can pre-configure the VHD. EC2 users can configure their virtual machines or use pre-configured machine images for computation. The user can choose the size, power, memory, regions, and zones to launch.
Storage

Storage Block Blobs are used for storage that consists of blocks and large Blobs are uploaded efficiently. Cool and Archive access tiers of Azure blob storage allow for data archiving.

 

Has temporary storage that is created once an instance begins and destroyed when it ends. There is also block storage that can be attached or separated from an instance. Object Storage is offered with S3 and data archiving services with Glacier. Support for relational, NoSQL databases and Big Data is provided.
Networking A virtual network is used for networking and content delivery. VPN gateway is used for cross-premises connectivity. Load balancer and application gateway are used for load balancing. Networking is provided by various partners and networks that interconnect with data centres using different products. A virtual private cloud is used for networking and API is used for cross-premises connectivity. Elastic load balancing is used to balance network load.
Integration Native integration is offered for VBS, SQL database, Active Directory. Works well with .Net as well. There aren’t many open source integrations but supports Linux and Hadoop clusters There are open source integrations available with AWS such as GitHub and Jenkins. It works well with Linux servers as well.
Ease of Use Easier to adopt for Windows platform users. Simple to integrate cloud instances with Windows servers and create a hybrid environment. User friendly interface, diverse array of features, well-documented, offers flexibility and customisation
Pricing Pay as you go model but charge per minute. Short term plans are offered with options of pre-paid and monthly charges. Pay as you go model charged per hour of usage. Instances can be purchased in some circumstances.

Conclusions

Ultimately, the choice between them is a platform choice. Some firms see themselves as committed to the Windows and Microsoft platform, are comfortable with the easy interoperability of all the tools in the Microsoft platform and not dismayed by any lack of interoperability with other best of breed tools.

Amazon is chosen by firms which are primarily looking for a technical edge in infrastructure as a service and are prepared to look more widely for complementary best of breed tools and to invest more effort in extracting advantages from them.

Whichever provider you choose, you will be reaping the benefits of a highly scalable cloud solution that can cater to the ever-fluctuating business requirements.

CFO’s Guide to Using Algorithms

CFO’s Guide to Using Algorithms

You already make decisions using common-sense and data. Algorithms help by making it practical to use a lot more data. – if you have it. The logic behind the decisions, with or without algorithms, is often the same.  

For example, you already make decisions and predictions based on your understanding of relationships like the causal relationships between your sales spend and sales volume. You already try to understand your customers by grouping them together into segments based on similarity or differences – perhaps on their propensity to spend or their preferences – so that you can treat them appropriately. Perhaps by sending them different messages, messages that will resonate more.

If you have the data, algorithms can process it in a reasonable time to come up with a better understanding of relationships and better groupings (classifications) – perhaps by revealing patterns that you were not aware of before.

Commonplace Examples

Predictions

If you want to predict what sales will be next year, then you could assume that there is a direct relationship between sales and the spend on salespeople. If you are increasing the numbers of salespeople you could predict a corresponding increase in sales, with some adjustment for ramping up. If you want to consider more factors, then you might want to assess the relative importance of investments in marketing, customer service, trade discounts and revenue growth management activity. Algorithms use data on relationships between these factors and sales to make a sales prediction. Such approaches can help create powerful marketing data analytics and sales analytics.

Ranking Prospects

If you wanted to identify where to focus the attention of your B2B salesforce then you could simply assume that company size is the best predictor of potential and rank prospects for your sales team in order of their prospects’ annual sales turnover.

If you want to consider more features than just turnover size, then you would want to use your understanding of current key accounts to check which other features are relevant and produce meaningful sales analytics. For example, does industry sector affect sales to customers, does growth rate, does turnover per employee? Algorithms use data to check these relationships for significance and group customers accordingly.

Now you could group new prospects on how similar they are to these groups of existing customers. If you assemble data on a hundred thousand companies, many that your salespeople will know nothing about, you can use an algorithm to group and prioritise them efficiently by similarity with your current best customers.

Relationship Algorithms

Relationship algorithms base their predictions on the relationships between values using mathematical methods (e.g. regression, Markov chains).

Clickstream Analytics

Clickstream analytics tracks relationships between clicks – in what sequences or chains of clicks do people click through the pages of your website? One click leads to another and so each click has a certain probability of following another. You can predict the likelihood of the next click.  For marketing purposes, you can cluster clickstreams to identify different types of users following different paths – assigning different personas to these paths. You can do clickstream analysis in real time, (using tools that work on data steams) to predict visitor clicks and decide which ads or messages should be shown to the visitor next.

Credit and Liquidity Risk

In lending, you can use relationship methods to predict a probability of default and price the risk better. You can use relationship methods on quite complex scenarios, for example, in banking, what are the chances of a combination of related events occurring such that a bank runs out of cash?

Classification Algorithms

Grouping algorithms use a mathematical process called classification. There are different algorithms to use depending on complexity. Clustering algorithms help you explore and organize data (K-means and support vector machines) – dividing your data up into like groups. Some make allocations based on decision trees (e.g. random forest) making an allocation decision at each node in the tree.  You can use classification methods to answer similarity questions like: “is this prospect in the high potential group or the low potential group?” As a lender you can also use classification to identify customers who are a credit risk based on their similarities with past defaulters.

Dynamic Segmentation

You can find new ways to define customer segments, grouping them together on the features that are shown by the data to be significant for your business. Not just from their demographics but using all the data that you have on them and their behaviour; including when they visit your website, communicate with you, or buy from you.

Customer Churn Predictive Analytics

You can use classification methods to predict when customers are going to stop doing business with you, by their similarities in behaviour or situation with other customers who churned – so you can intervene to stop them – dealing with the underlying service issue or making them a new offer.

Predictive Maintenance

You can use classification to predict when a machine is at risk of breaking down, based on patterns of machine operation that are similar to those preceding previous breakdowns.

Visual Recognition

There are algorithms that can classify images based on their similarity to other images, so they can answer questions like; is this circuit board OK or not OK?

Fraud Detection

There are also algorithms to identify anomalies or outliers in data, for example for checking for anomalous transactions that don’t fall into the “normal” group to help to detect fraud.

Recommendations

There are algorithms to generate recommendations based on similarities in people’s demonstrated preferences. For retailers, this is groups of products that are often bought together.

Bundle Pricing

You can use your data on products which are often bought separately to predict the discount that will sell the goods as a bundle and increase profit.

Matching Algorithms

Propensity to Consume

There are matching algorithms that estimate the propensity of someone to buy a specific product, (or, in medicine, to respond to a specific treatment).

Text Matching

There are algorithms that can match text, classify documents accordingly, and transform textual data into a more useful format.

Conclusion – Choosing Algorithms

Choosing algorithms is common-sense. If you have the data, they can add greater understanding and validation to your decision-making. Compare these example cases to what you want to achieve. Express what you want to do in a way that algorithms can address. Then there will be obvious algorithms to choose from. Try them out to see which algorithm works best. There are practical trade-offs between data volume, processing time and cost, accuracy, bias, speed and efficiency and you will only discover the optimal solution by iterating through the options. You will see where you have enough data and where you don’t. You will be able to assess the value that sourcing additional data might have.