BI & Warehousing

Migrating from OBI to Incorta

Dylan's BI Notes - Wed, 2020-12-09 11:50
I am sharing my experience of migrating from OBI to Incorta. Process Start with Incorta EBS Blueprint Configure and customize for the deploying company Optionally, Demo the Fusion Connector Preview and demo to business users using their own data Provide the existing OBI dashboard usage analysis – Help prioritize the replacement project Provide the lineage […]
Categories: BI & Warehousing

Oracle App Cloud and Incorta

Dylan's BI Notes - Wed, 2020-09-30 13:35
OTBI is great. But when people are migrating from Oracle EBS to Oracle Cloud App, they would like to view the data from both EBS and Oracle Cloud, Incorta becomes a cost saving and a quick implementation solution without implementing a data warehouse. Incorta is not a data warehouse although it does has the data […]
Categories: BI & Warehousing

Oracle Analytics Server is here

Rittman Mead Consulting - Mon, 2020-02-03 09:47
Oracle Analytics Server is here

Oracle has been talking about this product for months and we’re pleased to tell you the wait is over (for Linux users anyway), and Oracle Analytics Server (OAS) 5.5.0 is now available for download on edelivery(search Oracle Analytics Server).

Oracle Analytics Server is hereOracle Fusion Middleware needs to be downloaded separately

If your organisation uses Oracle technology for data analytics, then the likelihood is you’ve heard of Oracle Analytics Cloud (OAC). You may have also come across the latest addition to the family, OAS, the new on-prem version of OAC, set to eventually replace OBIEE.

The umbrella term, Oracle Analytics, now includes:

Oracle Analytics Server is here
  • Oracle Analytics Server (OAS)
  • Oracle Analytics Cloud (OAC)
  • Oracle Analytics for Applications (OAX)

Whilst OAC is the jewel in the crown, and will receive regular quarterly updates, these updates will be reflected in OAS. You’ll be pleased to hear current OBIEE users will be automatically licensed for OAS - the logic behind this is that OAS becomes a stepping stone in your journey to using Cloud. If you’re buying OAS new, the licensing model is the same as the current OBIEE model.

OAS looks almost the same as OAC, minus some features, like the Natural Language Generator. This feature generates explanations of your visualisations in 28 different languages and will probably be included in a later version of the tool.  

How does OAS compare to OBIEE and OAC?

  • Licensing: OAS now includes options like Data Visualization (DV) and Mobile which were previously considered extra.
  • Data Visualization: Oracle’s self-service visualisation tool does what it says on the tin. Allows you to decipher your enterprise data with intelligent visuals. Now it includes almost all the new features available in OAC. A big step forward compared to the DV version available in the latest OBIEE
  • Data Flows: Clean and Transform your data via a GUI based tool without leaving your analytical platform.
  • Machine Learning: All the goodies related to “one-click forecast” or  “Explain” and the full ML capabilities are now included in the on-prem Oracle Analytics Edition!
  • Configuration Options: OAS provides the "OBIEE"-type configuration options, where you can tweak each componend individually

Oracle’s aim is for users to achieve “100% data literacy” and plan to do this via their vision for analytics: augmented, collaborative and integrated. OAS really plays into this strategy, allowing users to employ data science and machine learning techniques to both analyse current trends and predict future ones (find out more in this blog post)

Oracle Analytics Server is here

Talk to us about how to migrate from OBIEE to OAS or OAC. We can help you with every deployment scenario including on-prem, hybrid, full public cloud, or a mix and match of these suited to your needs. Email us: to arrange a chat with one of our team.

Categories: BI & Warehousing

What's new in OAC5.5?

Rittman Mead Consulting - Mon, 2020-02-03 03:01
What's new in OAC5.5?

Last Friday, alongside Oracle Analytics Server (for which a blog post is coming), new OAC version came out, let's have a quick look at all the new features it includes!


If you use Maps often, then there is a good list of options available to you! The very first is the possibility to associate a Map Layer to a Data Column directly in the data source definition.

Let's say you have a column City Zones in your dataset, which divides a city in customized areas based on your company needs and you have a map layer defining geographically those areas (e.g. with a GeoJSON file). I created an example with Verona, the city where I live. Custom GeoJSON file was created using

What's new in OAC5.5?

The upload of custom shapes and their usage in OAC was already available since some time, however you as project creator had to associate the City Zone column to the correct Map Layer for each map visualization included in your project. Now you can define the Map Layer to Data column association once for all at datasource level, so every Map using the City Zone column will automatically use the correct Layer.

What's new in OAC5.5?

Another cool new feature in Maps is the AutoFocus on Data, meaning that the  visualization will automatically zoom and center the map appropriately based on the dataset presented and rearrange in case of changes in the filtering.

Pivot Tables

Another new option is available in pivot tables where now you can set Totals and Subtotals Above and Below like you were used to do in Answers. Like the "old" tool you can now set a different format for the Totals and Subtotals with coloring, background and font formatting options available. You have now the full control of the layout and can make beautiful or horrible (like the below) color choices.

What's new in OAC5.5?Visualizations

The perfect visualization is now available: the Spacer Viz! This is an empty visualization that you can add to your canvas allowing you to optimize the layout in cases where you need an extra white space.

What's new in OAC5.5?

Another news in this release is related to the Custom Background: now it's possible to define a color or an image as background for the whole Project or for a single canvas. The image can be a URL reference or uploaded from the desktop. There are also options to position the image in the screen and to auto-fit the image in the window size. Adding a custom background to the whole project means that every time a new canvas is added, it will already have the selected image/color by default.

What's new in OAC5.5?

Another news is represented by the Butterfly Viz, this view was already available  as plugin from the Oracle Analytics Library, now becomes native in OAC5.5. The butterfly viz is useful when comparing two metrics across the same dimension.

What's new in OAC5.5?

By Default the two metrics are on the same scale, but there is also an option "Synchronized Scales" that, when set to OFF will show the metric on different scales.

Datasources and Data Gateway

A new datasource definition to Oracle NetSuite is now available, allowing the connection by passing the parameters Datasource, Account ID and Role ID on top of the usual Host, Username and Password.

What's new in OAC5.5?

An enhancement has been published also for the Oracle Database connection: now you can select between a Basic connection and Advanced. The Basic option should be used when connecting to single node databases. The Advanced, on the other side, is useful when connecting to Cluster RAC DBs where multiple hostnames and ports need to be listed. When selecting the Advanced option we can simply add a custom connection string like the below


A option is also available to use Data Gateway with Essbase sources, making the OAC  on-premises Essbase connection only one click away by just enabling Use Data Gateway option in the screen. The usage of Data Gateway is also available now on BI Publisher allowing the pixel perfect reporting from on-premises datasources.

What's new in OAC5.5?

Another option in BI Publisher is Data Chunking, extremely useful for big reports since it allows the report execution in multiple sub-jobs in parallel with a final job to consolidate the results in a unique output.

The above are the news for this release, do you want more detailed examples on a particular features? Let me know in the comments and I'll write about it!

Categories: BI & Warehousing

A decade in data, a decade to come

Rittman Mead Consulting - Fri, 2020-01-10 08:52

First of all I’d like to take the chance to wish everyone in our network, colleagues, customers and friends a very happy new year.

The last decade was an incredible one for the data industry.  I realised before writing this that 2010 was actually when I started at my first data management consultancy, so it’s given me the perfect chance to reflect on what I’ve witnessed in the last 10 years.  

Data Insights out of IT and into the business

One thing that stands out is the relevance of insights through data within business functions.  The reason why we do what we do at Rittman Mead is because we believe that data solutions will drive positive change for us all in the future.  Making it accessible, insightful and interesting to people who are doing great things for great companies is the goal for all of us.

During the last decade we saw the mass market adoption of line of business data discovery tools such as Tableau, Oracle Data Visualization, QlikSense & Microsoft PowerBI.  All wonderfully innovative products that have flown the flag for bringing insights through data into the business.  

Oracle DV Geo LayersWhat is Big Data?

Also, what has happened to the Big Data hype which was a huge back in 2010?

Ten years on do we think it lived up to its hype?  Do we even know what it means? On reflection was it as groundbreaking as we might have thought at the time?  This is an argument that could be had but might never end.

Public Cloud Cover

One thing is for sure - the adoption at scale of public Cloud, SaaS, Storage and what it seems is infinite amounts of processing power has been huge since 2010.  We've seen AWS and Microsoft Azure leading the way, we see Google and Oracle Cloud Infrastructure catching up.  We know there are pros, cons, knowns and unknowns when moving to Cloud hosted applications and platforms.  We know that one thing is for sure - come 2030 the Cloud will be bigger whether we like it or not.

Streams of Events

Finally the emergence of event driven architectures is as innovative as it is complex. Large growth companies like Uber have built their business model on the ability to process huge amounts of events at low latency in order to operate.  Think demand + supply and Uber’s dynamic pricing model.  Fortunately open source projects like Kafka and particularly organisations like Confluent, the company behind Kafka, have made it easier to integrate this functionality to businesses throughout the world.

Analysing stream events in realtime using Kafka KSQL

It wouldn’t be fair to say that the innovation in the data industry has always been used ethically in the last 10 years and this is something that we all have a part to play in going forwards.  The Facebook / Cambridge Analytica scandal certainly made everyone reflect on the importance of using data for positive and ethical outcomes rather than underhand and in-transparent tactics.

And whats next?

As we embark on a new decade, we at Rittman Mead are excited about what is to come.  Artificial Intelligence, Machine Learning and Augmented Analytics is going to hit the mainstream and will play a part in all of our lives.  But the good news is - there will always be a place in this market for people.  Data is necessary, sophisticated machines are necessary but it relies on the people for it to work whether that be building an algorithm or making a decision based on the outcome of the information that a particular algorithm has generated.  We will see more automation and this is a brilliant thing.  Robotic Process Automation (RPA) and Autonomous databases such as Oracle’s offering are designed to breed innovation, not replacement.

Come and meet us in February

For us it all starts in February at Oracle OpenWorld Europe.  We will be exhibiting and presenting here and there will be insightful, thought provoking and relatable content throughout the two days.  Augmented analytics & autonomous will be at the top of the agenda and rightly so.  What’s more, it’s completely free to attend. Registration details can be found here:

We’d love to meet with you if you do decide to come.  To make this easier (as we know there is a lot to cover on days like these) we have set up a calendar where you can book a meeting slot with one of our sales team or technical consultants.  We can talk about:

  • Oracle
  • AI/ML
  • What are we doing?
  • What are you doing?
  • Anything you like in an informal setting

Please feel free to book a time slot using the link below:

Finally we wish everyone a very prosperous new decade - who knows where we will be by 2030!

Categories: BI & Warehousing

Machine Learning and Spatial for FREE in the Oracle Database

Rittman Mead Consulting - Fri, 2019-12-06 04:34
Machine Learning and Spatial for FREE in the Oracle Database

Last week at UKOUG Techfest19 I spoke a lot about Machine Learning both with Oracle Analytics Cloud and more in depth in the Database with Oracle Machine Learning together with Charlie Berger, Oracle Senior Director of Product Management.

Machine Learning and Spatial for FREE in the Oracle Database

As mentioned several times in my previous blog posts, Oracle Analytics Cloud provides a set of tools helping Data Analysts start their path to Data Science. If, on the other hand, we're dealing with experienced Data Scientists and huge datasets, Oracle's proposal is to move Machine Learning where the data resides with Oracle Machine Learning. OML is an ecosystem of various options to perform ML with dedicated integration with Oracle Databases or Big Data appliances.

Machine Learning and Spatial for FREE in the Oracle Database

One of the most known branches is OML4SQL which provides the ability of doing proper data science directly in the database with PL/SQL calls! During the UKOUG TechFest19 talk Charlie Berger demoed it using a collaborative Notebook on top of an Autonomous Data Warehouse Cloud.

Machine Learning and Spatial for FREE in the Oracle Database

Both Oracle ADW and ATP include OML by default at no extra cost. This wasn't true for all the other database offerings in cloud or on-premises which required an additional option to be purchased (the Advanced Analytics one for on-premises deals). The separate license requirement was obviously something that limited the spread of this functionality, but, I'm happy to say that it's going away!

Oracle's blog post yesterday announced that:

As of December 5, 2019, the Machine Learning (formerly known as Advanced Analytics), Spatial and Graph features of Oracle Database may be used for development and deployment purposes with all on-prem editions and Oracle Cloud Database Services. See the Oracle Database Licensing Information Manual (pdf) for more details.

What this means is that both features are included for FREE within the Oracle Database License! Great news for both Machine Learning as well as Graph Databases fans! The following tweet from Dominic Giles (Master Product Manager for the Oracle DB) provides a nice summary of the licenses including the two options for the Oracle DB 19c.

The #Oracle Database now has some previously charged options added to the core functionality of both Enterprise Edition and Standard Edition 2. Details in the 19c licensing guide with more information to follow.

— dominic_giles (@dominic_giles) December 5, 2019

But hey, this license change effects also older versions starting from the 12.2, the older one still in general support! So, no more excuses, perform Machine Learning where your data is: in the database with Oracle Machine Learning!

Categories: BI & Warehousing

Scalable Distributed BI Architecture

Dylan's BI Notes - Tue, 2019-11-26 19:15
Incorta, a scalable distributed BI system...
Categories: BI & Warehousing

Rittman Mead at UKOUG TechFest 19

Rittman Mead Consulting - Tue, 2019-11-19 10:25
Rittman Mead at UKOUG TechFest 19

Like every year, December for the Oracle community means UKOUG! This time is special since the event, named TechFest19, will be hosted at The Grand Brighton Hotel, near our base office!

Rittman Mead at UKOUG TechFest 19

Let's talk about the important stuff first: we are organising a Techfest party featuring "The Chaps" on Monday 2nd Dec between 7PM and 10:30PM in our office at Platf9rm, Hove Town Hall. We'll be having few drinks and while enjoying the live music, if you are interested register yourself here!

Rittman Mead at UKOUG TechFest 19

Now on the main topic, the Techfest! Rittman Mead will be well represented this time with four talks:

Data the Missing Ingredient

Monday 2nd at 14:15, Location: Stage 4

Jon Mead, our CEO, will introduce you to the concepts of Data Management, Engineering and Governance, how they should be addressed across a wide range of projects and which Oracle tools are there to help you.

Become an Equilibrista: Find the Right Balance in the Analytics Tech Ecosystem

Tuesday 3rd at 11:00, Location: Stage 2

Rittman Mead at UKOUG TechFest 19

Do you want to understand how to avoid the "Excel Chaos" in your organization? Let me show you how you can balance Centralized and Self-service analytics, taking the best of both worlds but still using a unique tool: Oracle Analytics Cloud!

Is it Corked? Wine Machine Learning Predictions with OAC

Tuesday 3rd at 15:45, Location: Stage 2

Do you love Wine? And maybe you don't dislike Analytics? Then join me in understanding how Oracle Analytics Cloud can be used for data-science! We'll be analysing a Wine dataset and using OAC to create a predictive model scoring wine quality! If you are a business analyst looking to start your path into Machine Learning, this session is a kickstarter!

Rittman Mead at UKOUG TechFest 19

Picking a Good Wine for <$20 Using Oracle Autonomous, Machine Learning and Analytics Cloud

Monday 2nd at 09:00, Location: Stage 2

Using the same dataset as the talk above, Charlie Berger, Sr. Director of Product Management, will show you how to build a predictive model by performing Machine Learning directly within the Oracle Autonomous Datawarehouse, all accessed by a notebook interface. I'll then show you how we can integrate such model within OAC and show the main drivers as well as the model outcomes!

During the event, few of the Rittman Mead folks will be around. If you see us in sessions, around the conference or during our talks, we'd be pleased to speak with you about your projects and answer any questions you might have.

Categories: BI & Warehousing

Timestamp Functions and Presentation Variables in Oracle Cloud Analytics

Rittman Mead Consulting - Fri, 2019-11-01 04:12

One of the most popular Rittman Mead blog posts over the last 10 years is Timestamps and Presentation Variables. As we are seeing more and more migrations to OAC, we decided to review and revise this post for the latest version of Oracle Cloud Analytics (OAC), 105.4.0-140 as of October 2019. Read more about the latest updates here.


One could say that creating a chart is not the most complex task in the world of Business Intelligence but we would argue that creating a meaningful report that perfectly illustrates the message hidden in data and therefore adds value to the management is nowhere close to being easy!    A good way to make a report as informative as possible is to use trends and comparison. And to do so, a perfect tool would be the time analysis functions. For example comparing sales in a period of time this year to the same period of time the year before. Or measure the similarity or dissimilarity of sales in different months of the year.

Demo Platform

I have used a free trial instance of OAC for this demo. If you haven’t done yet, sign up for a free 30-day trial Oracle Cloud account (different to an Oracle account). Use the account to access the Oracle Cloud Infrastructure (OCI) console which is the latest Oracle movement towards having one integrated cloud platform to manage all your Oracle cloud applications, platforms, and infrastructure in one place.
From the OCI console it is 5 to 10 minutes before your free trial instance of OAC is up and running. For the detailed step by step of creating a new instance read here.

Demo Goals

In this blog post I intend to show you how to combine the power of timestamp functions and presentation variables to create robust, repeatable reports. We will create a report that displays a year over year analysis for any rolling number of periods, by week or month, from any date in time, all determined by the user. This entire demo will only use values from a date and a revenue field.


TIMESTAMPADD() manipulates data of the data types DATE and DATETIME based on a calendar year.

Syntax: TIMESTAMPADD(interval, expr, timestamp)
Example: TIMESTAMPADD(SQL_TSI_MONTH, 12,Time."Order Date")
Description: Adds a specified number of intervals to a timestamp, and returns a single timestamp.

Read more about other calendar functions.

Building Filters

Starting to build our demo, the filter below returns all dates greater than or equal to 7 days ago including the current date.

In other words we have now a functional filter to select all the rows where Date >= a week ago.

As a good practice, always include a second filter giving an upper limit to the time filter. For example "Periods"."Day Date" < CURRENT_DATE would confirm that there won’t be any records that you don’t want in the mix and therefore no unnecessary strain on the system.

Let’s go one step further, instead of going 7 days back, we could try and include all the previous days in the current month or in other words dates >= the first day of the month. In this scenario, we can use the DAYOFMONTH() function to get the calendar day of any date. From here it will be easy to calculate the number of days in the month so far. Our new filter would look like this:

For example, if today is October 16th, DAYOFMONTH(CURRENT_DATE) would equal 16. Thus, we would subtract 16 days from CURRENT_DATE to go back to September 30th, and adding one will give us October 1st.

Presentation Variables

A presentation variable is a variable that can be created from the front end, the Analytics as part of one of the following types of dashboard prompts:

  • Column prompt, Associated with a column and the values that it can take come from the column values. For information on working with column prompts, see Creating a Column Prompt.
  • Variable prompt, Not associated with any column, and you define the values that it can take. For information on working with variable prompts, see Creating a Variable Prompt.

Each time a user selects a value in the column or variable prompt, the value of the presentation variable is set to the value that the user selects and will then be sent to any references of that filter throughout the dashboard page. This could be filters, formulas and even text boxes.

The first presentation variable we could introduce is to replace the CURRENT_DATE with a prompted value. Let’s call this presentation variable pv_Date,

  • Use the syntax @{pv_Date} to call this variable in the reports.
  • For variables of type string, surround the name in single quotes: ‘@{pv_String]’
  • It is good practice to assign a default value to the presentation variables so that you can work with your report before publishing it to a dashboard. For example the default value for the pv_Date is CURRENT_DATE so the new syntax would be @{pv_Date}{CURRENT_DATE}

Demo Time!

Our updated filter after replacing the CURRENT_DATE looks like below. Will will refer to this filter later as Filter 1 (F1).

The filter is starting to take shape. Now let's say we are going to always be looking at a date range of six months before the selected date. All we would need to do is create a nested TIMESTAMP function. To do this, we will “wrap” our current TIMESTAMP with another that will subtract six months:

Now we have a filter to select dates that are greater than or equal to the first day of the month of any given date and all the six months prior to that.

To take this one step further, we can create another presentation variable called  pv_n to allow the users to determine the amount of months to include in this analysis from a dashboard prompt.

Here is the updated version of our filter using the number of periods presentation variable and a default value of 6, @{pv_n}{6}. We will refer to the following filter as Filter 2 (F2).

Our TIMESTAMPADD function is now fairly robust and will give us any date greater than or equal to the first day of the month from n months ago from any given date. Now we will see what we just created in action by creating date ranges to allow for a Year over Year analysis for any number of months. Consider the following filter set:

This may appear to be pretty intimidating at first but if we break it into parts we can start to understand its purpose. Notice we are using the exact same filters from before; Filter 1 and Filter 2.   What we have done here is filtered on two time periods, separated by the OR statement.

  • The first date range defines the period as being the most recent completed n months from any given prompted date value, using a presentation variable with a default of today. Dates in the current month have been removed from the set by Filter 1.
  • The second time period, after the OR statement, is the exact same as the first only it has been wrapped in another TIMESTAMP function subtracting a year, giving you the exact same time frame for the year prior.

This allows us to create a report that can run a year over year analysis for a rolling n month time frame determined by the user.

A note on nested TIMESTAMPS: you will always want to create nested TIMESTAMPS with the smallest interval first. Then you will wrap intervals as necessary. In this case our smallest increment is day, wrapped by month, wrapped by year.

Let’s Go Crazy

A more advanced trick, If you use real time or near real time reporting: using CURRENT_DATE may be how you want to proceed. Otherwise, instead of using today as your default date value, use yesterday’s date since most data are only as current as yesterday.  Using yesterday will be valuable especially when pulling reports on the first day of the month or year - you generally want the entire previous time period rather than the empty beginning of a new one.  So, to implement, wherever you have @{pDate}{CURRENT_DATE} replace it with @{pDate}{TIMESTAMPADD(SQL_TSI_DAY,-1,CURRENT_DATE)}

One more change on our filter to make it extra-flexible here is to use a new presentation variable to determine if you want to display year over year values, or by month, or by week. This can be done by inserting a variable into your SQL_TSI_MONTH and DAYOFMONTH statements; changing MONTH to SQL_TSI_@{pv_INT}{MONTH} and DAYOF@{pv_INT}{MONTH}, where pv_INT is the name of our variable.

Start by creating a dummy variable in your prompt to allow users to select either MONTH or WEEK.  You can try something like this: CASE MOD(DAY("Time"."Date"),2) WHEN 0 'WEEK' WHEN 1 THEN 'MONTH' END

The updated filter now look like this:

In order for our interaction between Month and Week to run smoothly we have to factor in one last consideration: if we are to take the date December 1st, 2019 and subtract one year we get December 1st, 2018.  However, if we take the first day of this week, Sunday December 15, 2019 and subtract one year we get Saturday December 15, 2014.  In our analysis this will cause an extra partial week to show up for prior years.  To get around this we will add a case statement determining if '@{pv_INT}{MONTH}' = 'Week' THEN subtract 52 weeks from the first of the week ELSE subtract 1 year from the first of the month. With this, our final filter set will look like this:

With the use of these filters and some creative dashboarding, you can construct a report that easily allows you to view a year over year analysis from any date in time for any number of periods either by month or by week.

Filtered by Week intervals;

The formula below will give you the value of period rolling to use in the analysis;

In this post, we created a cloud version of the amazing demo previously described by Brian Hall.  As demonstrated, Timestamp functions and their power have been within the interesting topics of the visualisation and reporting for as long as we at Rittman Mead remember and can still be used in the realm of the Oracle Cloud Services in a very similar way as the past.

Feel free to get in touch, let us know your reviews and comments.

Categories: BI & Warehousing

OAC v105.4: Understanding Map Data Quality

Rittman Mead Consulting - Fri, 2019-10-18 08:58
 Understanding Map Data Quality

Last week Oracle Analytics Cloud v105.4 was announced. One of the features particularly interested me since it reminded the story of an Italian couple willing to spend their honeymoon in the Australian Sydney and ending up in the same Sydney city but in Nova Scotia for a travel agency error. For the funny people out there: don't worry, it wasn't me!

The feature is "Maps ambiguous location matches" and I wanted to write a bit about it.

#OracleAnalytics 105.4 update just about to go live and deploy on your environments. Check-out some of the new features coming. Here is a first list of selected items: Stay tuned with whole #OAC team (@OracleAnalytics,@BenjaminArnulf...) for more examples

— Philippe Lions (@philippe_lions) October 8, 2019

Btw OAC 105.4  includes a good set of new features like a unified Home page, the possibility to customize any DV font and more options for security and on-premises connections amongst others. For a full list of new features check out the related Oracle blog or videos.

Maps: a bit of History

Let's start with a bit of history. Maps have been around in OBIEE first and OAC later since a long time, in the earlier stages of my career I spent quite a lot of time writing HTML and Javascript to include map visualizations within OBIEE 10g. The basic tool was called Mapviewer and the knowledge & time required to create a custom clickable or drillable map was....huge!

With the raise of OBIEE 11g and 12c the Mapping capability became easier, a new "Map" visualization type was included in the Answers and all we had to do was to match the geographical reference coming from one of our Subject Areas (e.g. Country Name) with the related column containing the shape information (e.g. the Country Shape).

 Understanding Map Data Quality

After doing so, we were able to plot our geographical information properly: adding multiple layers, drilling capabilities and tooltips was just a matter of few clicks.

 Understanding Map Data QualityThe Secret Source: Good Maps and Data Quality

Perfect, you might think, we can easily use maps everywhere as soon as we have any type of geo-location data available in our dataset! Well, the reality in the old days wasn't like that, Oracle at the time provided some sample maps with a certain level of granularity and covering only some countries in detail. What if we wanted to display all the suburbs of Verona? Sadly that wasn't included so we were forced to either find a free source online or to purchase it from a Vendor.

The source of map shapes was only half of the problem to solve: we always need to create a join with a column coming from our Subject Area! Should we use the Zip Code? What about the Address? Is City name enough? The deeper we were going into the mapping details the more problems were arising.

A common problem (as we saw before about Sydney) was using the City name. How many cities are called the same? How many regions? Is the street name correct? Data quality was and still is crucial to provide accurate data and not only a nice but useless map view.

OAC and the Automatic Mapping Capability

Within OAC, DV offers the Automatic Mapping Capability, we only need to include in a Project a column containing a geographical reference (lat/long, country name etc), select "Map" as visualization type and the tool will choose the most appropriate mapping granularity that matches our dataset.

 Understanding Map Data Quality

Great! This solves all our issues! Well... not all of them! The Automatic Mapping capability doesn't have all the possible maps in it, but we can always include new custom maps using the OAC Console if we need them.

 Understanding Map Data QualitySo What's New in 105.4?

All the above was available way before the latest OAC release. The 105.4 adds the "Maps ambiguous location matches" feature, which means that every time we create a Map View, OAC will provide us with a Location Matches option

 Understanding Map Data Quality

If we click this option OAC will provide as a simple window where we can see:

  • How many locations matched
  • How many locations have issues
  • What's the type of Issue?
 Understanding Map Data Quality

The type of issue can be one between:

  • No Match in case OAC doesn't find any comparable geographical value
  • Multiple Matches  when there are multiple possible associations
  • Partial Matches when there is a match only to part of the content
 Understanding Map Data Quality

We can then take this useful information and start a process of data cleaning to raise the quality of our data visualization.


Maps were and are a really important visualization available in OAC. The Maps ambiguous location matches feature provides a way to understand if our visualization is representative of our dataset. So, if you want to avoid spending your honeymoon in the wrong Sydney or if you just want to provide accurate maps on top of your dataset, use this feature available in OAC!

Categories: BI & Warehousing

OOW19 Review: Oracle Analytics Deep Dive

Rittman Mead Consulting - Wed, 2019-09-25 05:41
 Oracle Analytics Deep Dive

In my previous blog I outlined the global news regarding Oracle like the Always Free Tier, the new datacenter plan and the set of new tools for Data Science. Today's blog is dedicated to all the news announced regarding Oracle Analytics in any of the versions: Cloud, Server or Applications.

 Oracle Analytics Deep DiveOracle Analytics Server

OAS is the long awaited replacement of OBIEE 12c on-premises and promises  functional parity with OAC. Current official ETA is Fiscal Year 2020 and it will be available to customers as a free upgrade. With OAS all customers still on-premises will experience the following benefits:

  • Almost 1-1 feature with OAC
  • Complete compatibility with OAC
  • Simplified cloud migration and better support for hybrid deployments

A related announcement for on-premises customers regards licensing: there is only a single license to purchase OAS which includes all features within it, no separate option for Mobile or Self-Service Data Visualization needed!

Oracle Analytics for Application

This represents the new incarnation of BIApps, completely redesigned specifically for Fusion Apps. As his predecessor, OAX (this is the acronym) it's a packaged, ready-to-use solution with pre-built ETLs and Analytics content like RPD, dashboards, analysis, KPIs. Under the covers uses Oracle Autonomous Data Warehouse and Oracle Data Integrator Cloud. OAX is also extendible, by bringing additional datasets in ADW and extending the semantic model and catalog.

Oracle Analytics Cloud

Several enhancements were announced, especially during Gabby Rubin's (VP of Oracle Analytics Product Management) Strategy & Roadmap Session. New features will be available in most of the areas of the tool, including the core of the centralized reporting: the RPD.

 Oracle Analytics Deep DiveData Preparation

New options will be available in the Data Preparation/Enrichment phase such as:

  • Custom Enrichments based on pre-existing set of values. E.g. enriching PRODUCT_ID with fields coming from a standard Product dimension. This is an interesting idea to enable standardization of dimensions across reporting without forcing people to write SQL or to know where the standard information is coming from.
  • Force Enrichments/Masking: as Administrators, we could enforce some transformations like the credit card obfuscation of fields that may contain sensitive data.
Natural Language Generation

The Natural Language view is already present in the current version of OAC, there is a plan to enhance this visualization by adding more options in the settings panel for grouping and trending analysis.

 Oracle Analytics Deep Dive

Spatial Analytics in OAC

A few weeks ago I wrote about Oracle Spatial Studio, a tool designed to provide advanced visual Spatial Analytics. This tool will remain and progress over time, OAC will not cover all the specific use-cases of Spatial Studio. However OAC will enhance its spatial capabilities, like:

  • Providing accurate information about row geo-location: e.g. how many rows were correctly located, how may errors and menus to fix value to location association.
  • Provide spatial functions in the front-end: an end-user will be easily able to calculate the distance between points in a map by writing a simple Logical SQL statement. This option will probably appear on the RPD first (check the twitter thread below)

Yeah!Now you can natively perform #spatialanalytics on #oac! #geospatial functions are available in the RPD and can be exposed to #OracleDataVisualization!

— Francesco Tisiot (@FTisiot) September 16, 2019

As you can see, calculating the distance will be just a matter of having the correct dataset and writing a GeometryDistance function.

Connectivity and Security

One of OAC's missions is to become the Analytics Platform on top of any type of datasource. The plan in the future is to expand the list of connectors and security/configuration options like SSL or Kerberos. There is also a roadmap to extend the Data Gateway capabilities to query non-oracle databases.

Modeling capabilities

In OAC we were used to either the classic RPD approach or the self-service Data-Sets. The future reserves some news in both approaches:

  • A new cloud web-based Modeler with the objective of functional parity with the Admintool, so capable of handling more complex designs that the current light data-modeler. I believe this will be also an effort to adapt the RPD development process to the current standards of concurrent development, versioning and storage format.
  • A new Self Service Data Model solution to build light self service models allowing end-users to evolve datasets into proper models sharable and optimized for reporting.

I like the idea of allowing both top-down (centralized) as well as bottom-up (self-service) approach to data modeling. This provides clients the flexibility on the analytical approach while still allowing to enforce centralized rules  (e.g. unique source of truth) when needed.

Unified User Experience and Layout Customizations

As of now the old "Answers and Dashboards" and the new "Data Visualization Projects" were almost completely separated products with each one having its own home page and layout. In the next releases we'll see that the two worlds will start combining, with a unique home and a similar look and feel.

In other news, highly requested by end-users is the possibility of customize almost any option of the layout: from font type and size to colors of any object visible in a project.

Machine Learning Integration

As discussed in the previous OOW review post in the future OAC will be able to use models built in other tools like Oracle Machine Learning in the Autonomous Data Warehouse or Oracle Data Science. This provides an end-to-end Data Science story from Data Analyst to Data Scientist all with a simple, secure, highly configurable and performant toolset.

 Oracle Analytics Deep Dive

As you can see a lot of news coming in various aspects of the tool, from on-premise functional parity, a new packaged solution for Fusion Apps and a lot of features enhancing OAC functionality and customization options.

What do you think? Is this the right direction? Do you feel there is something missing?

Categories: BI & Warehousing

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

Rittman Mead Consulting - Mon, 2019-09-23 09:41
 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

In the Oracle world, last week was "the week" with Oracle Openworld 2019 happening in San Francisco. A week full of exciting news, some of it were also associated with words like "Free", never heard before in any Oracle associated topic. This blog post will go in detail into some of the news with a special focus on the Analytics and Data Science topics.

 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data ScienceOracle Cloud Free Tier

Let's start with the big news: Oracle Cloud Free Tier! A set of services that can ALWAYS be used for free which include Oracle's best offering in the database space like ATP (Autonomous Transaction Processing) and ADW (Autonomous Data Warehouse) as well as Compute, Storage and additional services for networking, monitoring and notifications.

 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

This is a huge news in Oracle ecosystem since it enables everyone to start using the products without the need of a credit card! The always free schema can be also used in conjunction with the 30-day Free Trial (with associated 300$ in credits) to experience the full set of Oracle products without spending a single cent.

 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

An additional interesting point (compared to previous Oracle's developer licensing models) is that there is nothing in the licensing terms blocking any customer to use the free tier for production usage! This means that potentially, if the resources provided satisfy the business requirements, anyone could potentially run production applications directly on the free tier! And, in cases when an upscale is needed, Oracle will provide a very easy option to switch from a free instance to a paid one.

However, as one can expect, the free tier has limitations, for example the databases will allow only 1 OCPU and 20GB of Storage each. On top of the technical limitation, for any of the products in the free tier there is no support and no SLAs. This means, for example, that in case of problems, you'll not be able to open a ticket to Oracle support. Something definitely to ponder about when implementing a production system.

OCI Gen2 and New Datacenters

During his Keynote, Larry Ellison also announced the plan to launch 20 new OCI Gen2 datacenters in the next year! An average of a new datacenter every 23 days!

 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data ScienceTaken from oracle documentation

This is very impressive and, as mentioned during the Keynote, will mean Oracle overtake Amazon for the number of datacenters. A particular mention needs to be given also to the OCI Gen2, the new version of Oracle Cloud Interface. The first generation of OCI mission was the pay per usage: offering a set of services available on demand and payable by hour. The OCI Gen2 adds the autonomous features to Gen1: services are now self-managed, self-patched, self-monitored, self-secured with no downtime required. OCI Gen2 removes a huge set of activities from the hands of administrators taking the human error out of the equation.

Analytics & Data Science

I had a talk on how to Become a Data Scientist with Oracle Analytics Cloud. The topic of Analytics & Data Science was really what interested me most, and my expectation for exciting news was met.

A whole new set of products will be shipped soon, making the whole Data Science experience more continuous and pervasive across the Oracle Products. Let's have a look at the news, I'll try to add links to the relevant sessions I attended.

  • Oracle Machine Learning: ML will be more pervasive in the Oracle Database, the Oracle Machine Learning Notebooks will be capable of handling R and Python in addition to SQL. All of this will be available on the Cloud Databases including ADW. A new Spark option is also coming enabling Machine Learning on the BDA.
  • Oracle Data Science: This is a brand new product for Data Science collaboration, work in team on projects, with sharing and versioning options available out of the box.
  • Oracle Data Catalog: Again a new product aimed at creating inventories of company's data assets and make them searchable and easily usable by business users or data scientist.
  • Oracle Analytics Cloud: A lot of new announcements for this product which is mature and consolidated in the market, like Natural Language Generation or enhancements in the Data Enrichment, which I'll address in a separate blog post.

An interesting feature is AutoML, available both in Oracle Machine Learning and Oracle Data Science, which removes some barriers to Data Science by automating most of the steps in the Machine Learning model creation such as model and feature selection, and hyper-parameters tuning.

 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data ScienceTaken from Oracle ML presentation

You might notice several enhancements in different products. However, the key indicator of Oracle's Data Science maturity is the fact that all of the products above can be easily combined! Oracle Data Science will use Oracle Machine Learning optimizations when running on supported platforms on top of datasets easily discoverable by Oracle Data Catalog. Machine Learning models developed by OML or ODS can then be exposed and used by Oracle Analytics Cloud. This provides an end to end path from data scientist to data analyst and business user all within the same toolset and on top of the best hardware (with support for GPUs coming)!

 Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

All in all a great OOW full of exciting news: no more barriers to access Oracle Cloud with the free tier, 20 new datacenters coming in the next year and a set of tools to perform Data Science, from the Analyst to the Data Scientist, in a collaborative and extremely performant way! If you want to have more news regarding Oracle Analytics Cloud, don't miss my next blog post (edit: available here)!

Categories: BI & Warehousing

HVR | Real-Time CDC ::Oracle Autonomous DW::

Rittman Mead Consulting - Fri, 2019-09-13 09:49
Introduction :Oracle Autonomous DW::

High quality Business Intelligence is key in decision making for any successful organisation and for an increasing number of businesses this means being able to access real-time data.  At Rittman Mead we are seeing a big upturn in interest in technologies that will help our customers to integrate real-time data into their BI systems.  In a world where streaming is revolutionising the way we consume data, log-based Change Data Capture (CDC) comes into the picture as a way of providing access to real-time transactional data for reporting and analysis. Most databases support CDC nowadays; in Oracle for example the redo logs are a potential source of CDC data. Integration tools that support CDC are coming to the fore and one of these is HVR.

HVR is a real-time data integration solution that replicates data to and from many different data management technologies, both on-premise and cloud-based, including Oracle, SAP and Amazon. The HVR GUI makes it easy to connect together multiple databases, replicate entire schemas, initialise the source and target systems and then keep them in sync.

HVR have recently added Oracle’s Autonomous Data Warehouse (ADW) to their list of supported technologies and so in this blog we are stepping through the process of configuring an HVR Channel to replicate data from an Oracle database to an instance of Oracle ADW.


Before setting up replication you have to install HVR itself. This is simple enough, a fairly manual CLI job with a couple of files to create and save in the correct directories. Firewalls also needs to allow all HVR connections. HVR needs a database schema in which to store the repository configuration data and so we created a schema in the source Oracle database. It also needs some additional access rights on the Oracle source database.

Process 1.

The first step is to register the newly created Hub in the HVR GUI. The GUI can be run on any machine that is able to connect to the server on which the HVR hub is installed. We tested two GUI instances, one  running on a Windows machine and one on a MAC. Both were easy to install and configure.

:Oracle Autonomous DW::

The database connection details entered here are for the HVR hub database, where metadata about the hub configuration is stored.


Next we need to define our source and target. In both cases the connection between the HVR and the data uses standard Oracle database connectivity. The source connection is to a database on the same server as the HVR hub and the target connection uses a TNS connection pointing at the remote ADW instance.

Defining the source database involves right clicking on Location Configuration and selecting New Location:

:Oracle Autonomous DW::

Configuring the target involves the same steps:

:Oracle Autonomous DW::

You can see from the screenshot that we are using one of the Oracle-supplied tnsnames entries to connect to ADW and also that we are using a separate Oracle client install to connect to ADW. Some actions within HVR use the Oracle Call Interface and require a more recent version of the Oracle client than provided by our 12c database server install.

Next up is creating the “channel”. A channel channel in HVR groups together the source and target locations and allows the relationship between the two to be defined and maintained.  Configuring a new channel involves naming it, defining source and target locations and then identifying the tables in the source that contain the data to be replicated.


The channel name is defined by right clicking on Channel Definitions and selecting New Channel.

:Oracle Autonomous DW::

We then open the new channel and right click on Location Groups and select New Group to configure the group to contain source locations:

:Oracle Autonomous DW::

The source location is the location we defined in step 2 above. We then right click on the newly created group and select New Action, Capture  to define the role of the group in the channel:

:Oracle Autonomous DW::

The Capture action defines that data will be read from the locations in this group.

A second Location Group is needed for the for the target. This time we defined the target group to have the Integrate action so that data will be written to the locations in this group.


The final step in defining the channel is to identify the tables we want to replicate. This can be done using the Table Explore menu option when you right-click on Tables.:

:Oracle Autonomous DW:: 5.

With the channel defined we can start synchronising the data between the two systems. We are starting with an empty database schema in our ADW target so we use the HVR Refresh action to first create the target tables in ADW and to populate them with the current contents of the source tables.  As the Refresh action proceeds we can monitor progress:

:Oracle Autonomous DW:: 6.

Now with the two systems in sync we can start the process of real-time data integration using the HVR Initialise action. This creates two new jobs in  the HVR Scheduler which then need to be started:

:Oracle Autonomous DW::

One more thing to do of course: test that the channel is working and replications is happening in real-time. We applied a series of inserts, updates and deletes to the source system and monitored the log files for the two scheduled jobs to see the activity captured from the redo logs on the source:

:Oracle Autonomous DW::

and then applied as new transactions on the target:

:Oracle Autonomous DW::

The HVR Compare action allows us to confirm that the source and target are still in sync.

:Oracle Autonomous DW::

Clearly the scenario we are testing here is a simple one. HVR can do much more - supporting one-to-many, many-to-many and also bi-directional replication configurations. Nonetheless we were impressed with how easy it was to install and configure HVR and also with the simplicity of executing actions and monitoring the channel through the GUI. We dipped in to using the command line interface when executing some of the longer running jobs and this was straightforward too.

Categories: BI & Warehousing

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Rittman Mead Consulting - Mon, 2019-09-02 03:00

Dashboards are most powerful through visual simplicity. They’re designed to automatically keep track of a specific set of metrics and keep human beings updated. Visual overload is like a binary demon in analytics that many developers seem possessed by; but less is more.

For example, many qualified drivers know very little about their dashboard besides speed, revs, temperature and fuel gauge. When an additional dash warning light comes on, even if it is just the tyre pressure icon let alone engine diagnostics light, most people will just take their car to the garage. The most obvious metrics in a car are in regard to its operation; if you didn't know your speed while driving you'd feel pretty blind. The additional and not so obvious metrics (i.e. dash warning lights) are more likely to be picked up by the second type of person who will spend the most time with that car: its mechanic. It would be pointless to overload a regular driver with all the data the car can possibly output in one go; that would just intimidate them. That's not what you want a car to do to the driver and that's certainly not what any organisation would want their operatives to feel like while their “car” is moving.

In light of recent political events, the exact same can metaphorically be applied to the big red Brexit bus. Making sense of it all might be a stretch too far for this article. Still, with appropriate use of Tableau dashboard design it is possible to answer seemingly critical questions on the topic with publicly available data.

There's An Ongoing Question That Needs Answering?
Where did 6 million+ signatures really come from?

Back in the UK, the Brexit fiasco is definitely still ongoing. Just before the recent A50 extensions took place, a petition to revoke article 50 and remain in the EU attracted more than 6 million signatures, becoming the biggest and fastest growing ever in history and sparking right wing criticism over the origin of thousands of signatures, claiming that most came from overseas and discrediting its legitimacy. Government responded by rejecting the petition.

Thankfully the data is publicly available ( for us to use as an example of how a dashboard can be designed to settle such a question (potentially in real time too as more signatures come in).

Tableau can handle JSON data quite well and, to nobody’s surprise, we quickly discover that over 95% of signatures are coming from the UK.

Now that we know what we're dealing with, lets focus the map on Britain and provide additional countries data in a format that is easier to digest visually. As cool as it is to hover over the world map, there's simpler ways to take this in.

Because in this case we know more than 95% of signatures originate from the UK, the heatmap above is far more useful, showing us the signature count for each constituency at a glance. The hotter the shading, the higher the count.

Scales Might Need Calibration
Bar Chart All The Way

Humans of all levels compute a bar chart well and it's perfect for what we need to know on how many signatures are coming from abroad altogether and from what countries in descending order.

With a margin so tiny, it's trickier to get a visual that makes sense. A pie chart, for example, would hardly display the smaller slice containing all of the non-UK origin signatures. Even with a bar chart we are struggling to see anything outside of the UK in a linear scale; but it is perfect if using logarithmic scales, which are definitely a must in this scenario.

And voila! The logarithmic scale allows the remaining counts to appear alongside the UK, even though France, the next country after the UK with most signatures, has a count below 50k. This means we can keep an eye on the outliers in more detail quite effortlessly. Not much looks out of place right now considering the number of expats Britain produces to the countries on the list. Now we know, as long as none of the other countries turn red, we have nothing to worry about!

Innovate When Needed

The logarithmic scale in Tableau isn't as useful for these %, so hacking the visualised values in order to amplify the data sections of interest is a perfectly valid way of thinking outside the box. In this example, half the graph is dedicated to 90-100% and the other half 0-90%. The blue chunk is the percentage of signatures coming from the UK, while every other country colour chunk is still so small. Since the totals from other countries are about the same as each mainland constituency, it's more useful to see it as one chunk. Lastly, adding the heat colour coding keeps the visual integrity.


Now that we have the count, percentage and location breakdown into 3 simple graphs we feel much wiser. So it's time to make them interact with each other.

The constituency heatmap doesn't need to interact with the bar charts. The correlation between the hottest bars and the heatmap is obvious from the get go, but if we were to filter the bars using the map, the percentages would be so tiny you wouldn't see much on the % graph. The same occurs for the Country bar chart, meaning that only the percentage chart can be usefully used as a filter. Selecting the yellow chunk will show the count of signatures for every country within it only.

Another way in which interactivity can be introduced is through adding further visualisations to the tooltip. The petition data contains the MP responsible for each constituency, so we can effectively put a count of signatures to each name. It's nice to be able to see what their parliamentary voting record has been throughout this Brexit deadlock, which was obtained publicly from the House of Commons portal https://commonsvotes.digiminster.comand blended in; as more votes come in, the list will automatically increase.

Keep It Simple

As you can see, 3 is a magic number here. The trio of visuals working together makes a dashing delivery of intel to the brain. With very little effort, we can see how many signatures come from the UK compared to rest of the world, how many thousands are coming from each country, how many from each constituency, who the MP you should be writing to is and how they voted in the indicative votes. Furthermore, this dashboard can keep track of all of that in real time, flagging any incoming surge of signatures from abroad, continuously counting the additional signatures until August 2019 and providing a transparent record of parliamentary votes in a format that is very easy to visually digest.

Categories: BI & Warehousing

Kafka | IoT Ecosystem ::Cluster; Performance Metrics; Sensorboards & OBD-II::

Rittman Mead Consulting - Wed, 2019-08-28 04:30
:Cluster; Performance Metrics; Sensorboards & OBD-II::

Infrastructure is the place to start and the keyword here is scalability. Whether it needs to run on premise, on cloud or both, Kafka makes it possible to scale at low complexity cost when more brokers are either required or made redundant. It is also equally easy to deploy nodes and nest them in different networks and geographical locations. As for IoT devices, whether it’s a taxi company, a haulage fleet, a racing team or just a personal car, Kafka can make use of the existing vehicle OBDII port using the same process; whether it’s a recording studio or a server room packed with sensitive electronic equipment and where climate control is critical, sensorboards can be quickly deployed and stream almost immediately into the same Kafka ecosystem. Essentially, pretty much anything that can generate data and touch python will be able to join this ecosystem.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

In large data centres it is fundamental to keep a close eye on misbehaving nodes, possibly overheating, constantly failing jobs or causing unexpected issues. Fires can occur too. This is quite a challenge with thousands and thousands of nodes. Though, Kafka allows for all of the node stats to individually stream in real time and get picked up by any database or machine, using Kafka Connect or kafka-python for consumption.

To demonstrate this on a smaller scale with a RaspberryPi 3 B+ cluster and test a humble variety of different conditions, a cluster of 7 nodes, Pleiades, was set up. Then, to make it easier to identify them, each computer was named after the respective stars of the Pleiades constellation.

  • 4 nodes {Alcyone; Atlas; Pleione; Maia} in a stack with cooling fans and heatsinks
:Cluster; Performance Metrics; Sensorboards & OBD-II::

  • 1 node in metal case with heatsink {Merope}
:Cluster; Performance Metrics; Sensorboards & OBD-II::

  • 1 node in plastic case {Taygeta}
:Cluster; Performance Metrics; Sensorboards & OBD-II::

  • 1 node in touchscreen plastic case {Electra}
:Cluster; Performance Metrics; Sensorboards & OBD-II::::Yes. It's a portable Retropie, Kafka broker & perfect for Grafana dashboards too::

Every single node has been equipped with the same python Kafka-producer script, from which the stream is updated every second in real-time under 1 topic, Pleiades. Measures taken include CPU-Percentage-%, CPU-Temperature, Total-Free-Memory, Available-System-Memory, CPU-Current-Hz.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

Kafka then connects to InfluxDB on Pleione, which can be queried using the terminal through a desktop or android SSH client. Nothing to worry about in terms of duplication, load balancing or gaps in the data. Worst case scenario InfluxDB, for example, crashes and the data will still be retrievable using KSQL to rebuild gap in DB depending on the retention policy set.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

We can query InfluxDB directly from the command line. The Measure (InfluxDB table) for Pleiades is looking good and holding plenty of data for us to see in Grafana next.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

A live feed is then delivered with Grafana dashboards. It's worth noting how mobile friendly these dashboards really are.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

At a glance, we know the critical factors such as how much available memory there is and how much processing power is being used, for the whole cluster as well as each individual node, in real time and anywhere in the world (with an internet connection).

It has then been observed that the nodes in the stack remain fairly cool and stable between 37 °C and 43 °C, whereas the nodes in plastic cases around 63 °C. Merope is in the metal casing with a heatsink, so it makes sense to see it right in the middle there at 52 °C. Spikes in temperature and CPU usage are directly linked to running processes. These spikes are followed by software crashes. Moving some of the processes from the plastic enclosures over to the stack nodes stopped Grafana from choking; this was a recurring issue when connecting to the dashboards from an external network. Kafka made it possible to track the problem in real time and allow us to come up with a solution much quicker and effortlessly; and then immediately also track if that solution was the correct approach. In the end, the SD cards between Electra and Pleione were quickly swapped, effectively moving Pleione to the fan cooled stack where it was much happier living.

If too many spikes begin to occur, we should expect for nodes to soon need maintenance, repair or replacement. KSQL makes it possible to tap into the Kafka Streams and join to DW stored data to forecast these events with increased precision and notification time. It's machine-learning heaven as a platform. KSQL also makes it possible to join 2 streams together and thus create a brand new stream, so to add external environment metrics and see how they may affect our cluster metrics, a sensor board on a RaspberryPi Zero-W was setup producing data into our Kafka ecosystem too.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

To keep track of the room conditions where the cluster sits, an EnviroPhat sensor board is being used. It measures temperature, pressure, colour and motion. There are many available sensorboards for SBCs like RaspberryPi that can just as easily be added to this Kafka ecosystem. Again, important to emphasize both data streams and dashboards can be accessed from anywhere with an internet connection.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

OBDII data from vehicles can be added to the ecosystem just as well. There are a few ways this can be achieved. The most practical, cable free option is with a Bluetooth ELM327 device. This is a low cost adaptor that can be purchased and installed on pretty much any vehicle after 1995. The adaptor plugs into the OBDII socket in the vehicle, connects via Bluetooth to a Pi-Zero-W, which then connects to a mobile phone’s 4G set up as a wi-fi hotspot. Once the data is flowing as far as needing a Kafka topic, the create command is pretty straight forward.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

With the obd-producer python script running, another equivalently difficult command opens up the console consumer for the topic OBD in Alcyone, and we can check if we have streams and if the OBD data is flowing through Kafka. A quick check on my phone reveals we have flow.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

To make things more interesting, the non-fan nodes in plastic and metal enclosures {Taygeta; Electra; Merope} were moved to a different geographical location and setup under a different network. This helps network outages and power cuts become less likely to affect our dashboard services or ability to access the IoT data. Adding cloud services to mirror this setup at this point would make it virtually bulletproof; zero point of failure is the aim of the game. When the car is on the move, Kafka is updating InfluxDB + Grafana in real time, and the intel can be tracked live as it happens from a laptop, desktop or phone from anywhere in the world.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

In a fleet scenario, harsh braking could trigger a warning and have the on-duty tracking team take immediate action; if the accelerometer spikes as well, then that could suggest an accident may have just occurred or payload checks may be necessary. Fuel management systems could pick up on driving patterns and below average MPG performance, even sense when the driver is perhaps not having the best day. This is where the value of Kafka in IoT and the possibilities of using ML algorithms really becomes apparent because it makes all of this possible in real time without a huge overhead of complexity.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

After plugging in the OBDII bluetooth adapter to the old e92-335i and driving it for 20 minutes, having it automatically stream data over the internet to the kafka master, Alcyone, and automatically create and update an OBD influxdb measure in Pleione, it can quickly be observed in Grafana that it doesn't enjoy idling that much; the coolant and intake air temperature dropped right down as it started moving at a reasonable speed. This kind of correlation is easier to spot in time series Grafana dashboards whereas it would be far less intuitive with standard vehicle dashboards that provide only current values.

:Cluster; Performance Metrics; Sensorboards & OBD-II::

So now that a real bare-metal infrastructure exists - and it’s a self-monitoring, low power consumption cluster, spread across multiple geographical locations, keeping track of enviro-sensor producers from multiple places/rooms, logging all vehicle data and learning to detect problems as far ahead as possible - adding sensor data pickup points to this Kafka ecosystem is as simple as its inherent scalability. As such, with the right Kafka-Fu, pretty much everything is kind of plug-&-play from this point onwards, meaning we can now go onto connecting, centralising and automating as many things in life as possible that can become IoT using Kafka as the core engine under the hood.

:Cluster; Performance Metrics; Sensorboards & OBD-II::
Categories: BI & Warehousing

OAC Row Limits and Scale Up or Down

Rittman Mead Consulting - Wed, 2019-08-28 04:26
OAC Row Limits and Scale Up or Down

I created an OAC instance the other day for some analysis in preparation of my OOW talk, and during the analytic journey I reached the row limit with the error Exceeded configured maximum number of allowed input records.

OAC Row Limits and Scale Up or Down

Since a few releases back, each OAC instance has fixed row limits depending by the number of OCPU assigned that can be checked in the related documentation, with the current ones shown in the table below.

OAC Row Limits and Scale Up or Down

If you plan using BI Publisher (included in OAC a few versions ago) check also the related limits.

OAC Row Limits and Scale Up or Down

Since in my analytical journey I reached the row limit, I wanted to scale up my instance, but surprise surprise, the Scale Up or Down option wasn't available.

OAC Row Limits and Scale Up or Down

After some research I understood that Scaling Up&Down is available only if you chose originally a number of OCPUs greater than one. This is in line with Oracle's suggestion to use 1 OCPU only for non-production instances as stated in the instance creation GUI.

OAC Row Limits and Scale Up or Down

When choosing originally an OAC instance with 4 OCPUs the Scale Up/Down option becomes available (you need to start the instance first).

OAC Row Limits and Scale Up or Down

When choosing the scale option, we can decide whether to increase/decrease the number of OCPUs.

OAC Row Limits and Scale Up or Down

Please note that we could have limited choice in the number of OCPUs we can increase/decrease by depending on the availability and current usage.

Concluding, if you want to be able to Scale Up/Down your OAC instances depending on your analytic/traffic requirements, always start your instance with a number of OCPUs greater than one!

Categories: BI & Warehousing

Rittman Mead at Oracle OpenWorld 2019

Rittman Mead Consulting - Fri, 2019-08-23 07:52
Rittman Mead at Oracle OpenWorld 2019

Oracle OpenWorld is coming soon! 16th-20th September in Moscone Center, San Francisco. It's Oracle's biggest conference and I'll represent Rittman Mead there with the talk "Become a Data Scientist"  exploring how Oracle Analytics Cloud can speed any analyst path to data science. If you are an analyst looking to move your first steps in data-science or a manager trying to understand how to optimize your business analytics workforce, look no further, this presentation is your kickstarter!

Rittman Mead at Oracle OpenWorld 2019

To have an introduction to the topic have a look at my blog post series episodes I, II and III.

If you'll be at OOW2019 and you see me around, don't hesitate to stop me! I’d be pleased to speak with you about OAC, Analytics, ML, and more important topics like food or wine as well!

Categories: BI & Warehousing

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

Rittman Mead Consulting - Wed, 2019-08-07 08:19
Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIsHow much did I spend so far? Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

It's a typical question we ask ourselves daily and what do we do? Probably check the bank account status on our phone and yell at ourselves for all the money we trashed the previous night at the pub.

The Cloud

One of the great benefits of the cloud is that there is no big upfront cost required to start playing with the latest tool or technology, we just need to fill in a few forms, write down the credit card details and there we go! However, the cloud doesn't mean free: most of the times we pay based on resource and time consumption and things can become pretty expensive if we don't manage our resources wisely.

The main Oracle Cloud Dashboard offers a preview of the Month to Date Cost and by clicking on it, we can easily understand the cost per product. Like in the example below we spend £322.8 month to date and precisely £262.80 on Oracle Analytics Classic.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

We can go another step down the line: if for example, we have multiple versions of the same product, we'll see a line for each version or licensing method. In our case, all the money comes from a single B88303 - OAC Enterprise Edition OCPU per Hour product with an overall 60 hours of uptime (OAC billing is per hour).  

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

However, this requires a manual login into the Oracle Cloud to check the balance, which is not optimal if we want to display this information in external tools or automate part of the cost-checking procedures. Fortunately, we can retrieve the same information with Oracle Cloud Account Metering REST APIs.

Oracle Cloud Account Metering REST APIs

Oracle Cloud Account Metering REST APIs expose a lot of useful information about our Oracle Cloud account via REST APIs. We can, for example, check our subscription details, credit promotions, resource usage, cost and quotas. All we need to test the REST APIs is cURL, a command-line utility for sending HTTP requests. The syntax to retrieve the data is



  • <TENANT_ID> is the identity domain ID, you can find it under the Oracle Analytics Cloud -> Overview
Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs
  • <BASE_PATH> is the base URI of the resource, e.g. /metering/api/v1
  • <RESOURCE_NAME> is the name of the specific resource we are requesting
Checking the Cost

If, as per the example below, we want to understand the cost, we simply need to call the usagecost resource passing the <ACCOUNT_ID> parameter which can be found in the Overview page of every service we already have in our account.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

The basic cURL command to check the cost then becomes the following


Where on top of the parameters defined above we have

  • <START_TIME> and <END_TIME> with the format YYYY-MM-DDTHH:mm:sssZ e.g. 2019-08-01T00:00:00.000
  • <TIMEZONE> we specify which timezone to use for the date and time filter

So if like before, we're aiming to understand the cost from the beginning of the month, our suffix becomes


The result is in JSON format which we can easily parse the result with the command line tool jq.

curl -X GET ... | jq '.'

The output is

  "accountId": "<ACCOUNT_ID>",
  "items": [
      "subscriptionId": "...",
      "subscriptionType": "PRODUCTION",
      "serviceName": "ANALYTICS",
      "resourceName": "ANALYTICS_EE_PAAS_ANY_OCPU_HOUR",
      "currency": "GBP",
      "gsiProductId": "B88303",
      "startTimeUtc": "2019-08-01T00:00:00.000",
      "endTimeUtc": "2019-08-10T23:00:00.000",
      "serviceEntitlementId": "...",
      "costs": [
          "computedQuantity": 60,
          "computedAmount": 262.8,
          "unitPrice": 4.38,
          "overagesFlag": "Y"
  "canonicalLink": "/metering/api/v1/usagecost/<ACCOUNT_ID>?timeZone=Europe%2FRome&startTime=2019-08-01T00%3A00%3A00.000Z&endTime=2019-08-10T23%3A59%3A00.000Z"

As expected, we get, within the items section, an entry for every product and license type we have used. In our case we have the "serviceName": "ANALYTICS", with the Enterprise Edition option billed per hour ("resourceName": "ANALYTICS_EE_PAAS_ANY_OCPU_HOUR") and we used it for 60 hours with a unit price of £4.38 for a total amount of £262.8 perfectly in line with what we see in the webpage.

We can further filter our query using one of the following parameters:

  • computeType: the nature of the cost (Usage, Overcharge...)
  • datacenter: the datacenter for which cost needs to be retrieved
  • dcAggEnabled: to roll up the cost by datacenter
  • resourceName: the type of resource billing (e.g. ANALYTICS_EE_PAAS_ANY_OCPU_HOUR)
  • serviceEntitlementId: the Id of the service, can be found in the Overview page
Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs
  • serviceName: the name of the service e.g. ANALYTICS
  • usageType: the type of usage we want to be reported either TOTAL, HOURLY or DAILY

Unfortunately, none of the above filters allows us to check the cost associated with a precise instance of the service. If, for example, we have two instances with the same edition and type of billing, we can't determine, with the above call, what the cost associated to each of the two instances is since it's rolled up and instance type level. But we're not alone! We can achieve more granularity in the billing metrics by using the /tagged REST API and properly performing instance tagging on our services.

Instance Tagging

We can group instances of various services with Tags. Tags are labels that we can attach to an instance to group them based on our company rules. Oracle allows two types of tagging for resources: free-form and defined.

With free-form tagging we can append any key-value label to our instances, e.g. we may want to tag an instance as Environment:Training with Environment being the key and Training being the label.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

The problem with free-form tagging is that we don't have control of which tag keys get associated to a certain resource and it's an error-prone method since we have to type a key and value every time (and they're not visible for cost-tracking).

If instead, we want to use a more controlled approach to tagging, we can then go for the defined tagging: while with free-form anyone was able to associate any key or value, with define tagging we create a namespace which will contain a set of tag keys.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

Once created the namespace, we can then create the set of keys within it. In this case, we create two pre-defined keys Environment and Project, please note that we flagged the COST-TRACKING checkbox to be able to use the tags with the Oracle Cloud Account Metering APIs.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

Please be aware that there are limits on the number of namespaces, of tags per resource and of cost-tracking tags which are available under the tagging documentation.

Now it's time to attach the defined tags to our instances, we can do so in the web UI during instance creation or after by selecting "Add Tags". More information under the related documentation.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

After we added the tags marked for cost-tracking to our instances we may have to wait up to 5 hours to see them in the "My Services" or via the REST APIs.

Querying Tagged Resources

There is an API within the Oracle Cloud Account Metering REST APIs which allows to querying the cost associated with tagged resources. The call is very similar to the one we used above, with the additional tagged prefix and tags=.... parameter. Taking the example above, if we can see the consumption associated with instances tagged as Operations:Project=Training then the call is the following


And the result is

  "accountId": "<ACCOUNT_ID>",
  "items": [
      "subscriptionId": "...",
      "subscriptionType": "PRODUCTION",
      "serviceName": "ADWC",
      "resourceName": "ADWC_PAAS_BYOL_OCPU_HOUR",
      "currency": "GBP",
      "gsiProductId": "B89039",
      "startTimeUtc": "2019-08-01T00:00:00.000",
      "endTimeUtc": "2019-08-10T23:00:00.000",
      "serviceEntitlementId": "...",
      "costs": [
          "computedQuantity": 23.0,
          "computedAmount": 8.06235468,
          "unitPrice": 0.35053716,
          "overagesFlag": "N"
  "canonicalLink": "/metering/api/v1/usagecost/<ACCOUNT_ID>/tagged?timeZone=UTC&startTime=2019-08-01T00%3A00%3A00.000Z&endTime=2019-08-10T23%3A59%3A00.000Z&usageType=TOTAL&tags=operations%3AProject%3DTraining"

A usage of ADWC for 23 hours for a total of £8.06 which is also visible from the My Services webpage.

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs

Appending the following jq command to the cURL call also displays the relevant information like serviceName, and cost details as separate columns

jq --raw-output '.items[] | "\(.serviceName)\t\(.subscriptionType)\t\(.resourceName)\t\(.currency)\t\(.costs[].computedAmount)\t\(.costs[].computedQuantity)\t\(.costs[].unitPrice)\t\(.costs[].overagesFlag)"' 

And the result is


Oracle Cloud Account Metering REST APIs offer an easy way to expose the Oracle cloud usage and cost externally. Used smartly in conjunction with instance tagging they provide a way to ensure cost and usage tracking down to the single resource or project.

If on the other way, the integration with REST APIs is not what you need, but you're looking into ways of getting notified when you're spending too much, check out the Alerts section of Managing and Monitoring Oracle Cloud.

Edit: You can also download your billing information as CSV from Oracle Cloud web-ui as per screenshot below which is very handy for one-off analysis. If, on the other side, you want to automate the export of billing information, then Oracle Cloud Account Metering REST APIs is the way to go!

Where is My Money Going? Checking the balance with Oracle Cloud Account Metering REST APIs
Categories: BI & Warehousing

Spatial Analytics Made Easy: Oracle Spatial Studio

Rittman Mead Consulting - Fri, 2019-07-26 10:31
 Oracle Spatial Studio

Let's say we need to understand where our company needs to open a new shop. Most of the time the decision is driven by gut feeling and some knowledge of the market and client base, but what if we could have visual insights about where are the high density zones with customers not covered by a shop nearby like in the map below?

 Oracle Spatial Studio

Well... welcome Oracle Spatial Studio!

Spatial Studio is Oracle's new tool for creating spatial analytics with a visual GUI. It uses Oracle Spatial database functions in the backen d exposed with an interface in line with the Oracle Analytics Cloud one. Let's see how it works!

QuickStart Installation

First of all we need to download Spatial Studio from the Oracle web page, for this initial test I downloaded the "Quick Start", a self contained version pre-deployed in a lightweight application server. For more robust applications you may want to download the EAR file deployable in Weblogic.

 Oracle Spatial Studio

Once downloaded and unzipped the file, we just need to verify we have a Java JDK 8 (update 181 or higher) under the hood and we can immediately start Oracle Spatial Studio with the ./ command.

The command will start the service on the local machine that can be accessed at https://localhost:4040/spatialstudio. By default Oracle Spatial Studio Quickstart uses HTTPS protocol with self-signed certificates, thus the first time you access the URL you will need to add a security exception in your browser. The configurations such as port, JVM parameters, host and HTTP/HTTPS protocol can be changed in the conf/server.json file.

We can then login with the default credentials admin/welcome1

 Oracle Spatial Studio

The first step in the Spatial Studio setup is the definition of the metadata connection type. This needs to point to an Oracle database with the spatial option. For my example I initially used an Oracle Autonomous Data Warehouse, for which I had to drop the wallet and specify the schema details.

 Oracle Spatial Studio

Once logged in, the layout and working flows are very similar to Oracle Analytics Cloud making the transition between the two very easy (more details on this later on). In the left menu we can access, like in OAC, Projects (visualizations), Data, Jobs and the Console.

 Oracle Spatial Studio

In order to do Spatial Analysis we need to start from a Dataset, this can be existing tables or views, or we can upload local files. To create a Dataset, click on Create and Dataset

 Oracle Spatial Studio

We have then three options:

  • Upload a Spreadsheet containing spatial information (e.g. Addresses, Postcodes, Regions, Cities etc)
  • Upload a Shapefile containing geometric locations and associated attributes.
  • Use spatial data from one of the existing connections, this can point to any connection containing spatial information (e.g. a table in a database containing customer addresses)
 Oracle Spatial StudioSample Dataset with Mockaroo

I used Mockaroo, a realistic data generator service, to create two excel files: one containing customers with related locations and a second one with shops and related latitude and longitude. All I had to do was to select which fields I wanted to include in my file and the related datatype.

 Oracle Spatial Studio

For example, the list of shop dataset contained the following columns:

  • Id: as row number
  • Shop Name: as concatenation of Shop and the Id
  • Lat: Latitude
  • Long: Longitude
  • Dept: the Department (e.g. Grocery, Books, Health&Beauty)

Mockaroo offers a perfect service and has a free tier of datasets with less than 1000 rows which can be useful for demo purposes. For each column defined, you can select between a good variety of column types. You can also define your own type using regular expressions!

 Oracle Spatial StudioAdding the Datasets to Oracle Spatial Studio

Once we have the two datasources in Excel format, it's time to start playing with Spatial Studio. We first need to upload the datasets, we can do it via Create and Dataset. Starting with the Customer.xlsx one. Once selected the file to upload Spatial Studio provides (as OAC) an overview of the dataset together with options to change configurations like dataset name, target destination (metadata database) and column names.

 Oracle Spatial Studio

Once modified the table name to TEST_CUSTOMERS and clicked on Submit Spatial Studio starts inserting all the rows into the SPATIAL_STUDIO connection with a routine that could take seconds or minutes depending on the dataset volume. When the upload routine finishes I can see the TEST_CUSTOMERS table appearing in the list of datasets.

 Oracle Spatial Studio

We can immediately see the yellow warning sign next to the dataset name, it's due to the fact that we have a dataset with no geo-coded information, we can solve this problem by clicking on the option button and then Prepare and Geocode Addresses

 Oracle Spatial Studio

Oracle Spatial Studio will suggest, based on the column content, some geo-type matching e.g. City Name, Country and Postal Code. We can use the defaults or modify them if we feel they are wrong.

 Oracle Spatial Studio

Once clicked on Apply the geocoding job starts.

 Oracle Spatial Studio

Once the job ends, we can see the location icon next to our dataset name

 Oracle Spatial Studio

We can do the same for the Shops.xlsx dataset, starting by uploading it and store it as TEST_SHOPS dataset.

 Oracle Spatial Studio

Once the dataset is uploaded I can geo-locate the information based on the Latitude and Longitude, I can click on the option button and the selecting Prepare and Create Lon/Lat Index. Then I'll need to assign the Longitude and Latitude column correctly and click on Ok.

 Oracle Spatial StudioSpatial Analytics

Now it's time to do some Spatial Analysis so I can click on Create and Project and I'll face an empty canvas by default

 Oracle Spatial Studio

The first step is to add a Map, I can do that by selecting the visualizations menu and then dragging the map to the canvas.

 Oracle Spatial Studio

Next step is to add some data by clicking on Data Elements and then Add Dataset

 Oracle Spatial Studio

I select the TEST_CUSTOMERS dataset and add it to the project, then I need to drag it on top of the map to visualize my customer data.

 Oracle Spatial Studio

Oracle Spatial Studio Offers several options to change the data visualizations like color, opacity, blur etc.

 Oracle Spatial Studio

Now I can add the TEST_SHOPS dataset and visualize it on the map with the same set of steps followed before.

 Oracle Spatial Studio

It's finally time for spatial analysis! Let's say, as per initial example, that I want to know which of my customers doesn't have any shops in the nearest 200km. In order to achieve that I need to first create buffer areas of 200km around the shops, by selecting the TEST_SHOPS datasource and then clicking on the Spatial Analysis.

 Oracle Spatial Studio

This will open a popup window listing a good number of spatial analysis, by clicking on the Transform tab I can see the Add a buffer of a specified distance option.

 Oracle Spatial Studio

Unfortunately the buffer function is not available in ADW at the moment.

 Oracle Spatial Studio

I had to rely on an Oracle Database Cloud Service 18c Enterprise Edition - High Performance (which includes the Spatial option) to continue for my metadata storage and processing. Few Takeaways:

  • Select 18c (or anything above 12.2): I hit an issue ORA-00972: identifier is too long when importing the data in a 12.1 Database, which (thanks StackOverflow) is fixed as of 12.2.
  • High Performance: This includes the Spatial Option

Once I used the DBCS as metadata store, I can finally use the buffer function and set the parameter of 200km around the shops.

 Oracle Spatial Studio

The TEST_SHOPS_BUFFER is now visible under Analysis and can be added on top of the Map correctly showing the 200km buffer zone.

 Oracle Spatial Studio

I can understand which customers have a shop in the nearest 200k by creating an analysis and select the option "Return shapes within a specified distance of another"

 Oracle Spatial Studio

In the parameters I can select the TEST_CUSTOMERS as Layer to be filtered, the TEST_SHOPS as the Layer to be used as filter and the 200Km as distance.

 Oracle Spatial Studio

I can then visualize the result by adding the TEST_CUSTOMERS_WITHIN_DISTANCE layer in the map.

 Oracle Spatial Studio

TEST_CUSTOMERS_WITHIN_DISTANCE contains the customers already "covered" by a shop in the 200km range, what I may want to do now is remove them from my list of customers in order to do analysis on the remaining ones, how can I do that? Unfortunately in the first Spatial Studio version there is no visual way of doing DATASET_A MINUS DATASET_B but, hey, it's just the first incarnation and we can expect that type of functions and many others to be available in future releases!

The following paragraph is an in-depth analysis in the database of functions that will probably be exposed in Spatial Studio's future version, so if not interested, progress directly to the section named "Progressing in the Spatial Analysis".

A Look in the Database

Since we want to achieve our goal of getting the customers not covered by a shop now, we need to look a bit deeper where the data is stored: in the database. This gives us two opportunities: check how Spatial Studio works under the covers and freely use SQL to achieve our goals (DATASET_A MINUS DATASET_B).

First let's have a look at the tables created by Spatial Studio: we can see some metadata tables used by studio as well as the database representation of our two excel files TEST_CUSTOMERS and TEST_SHOPS.

 Oracle Spatial Studio

Looking in depth at the metadata we can also see a table named SGTECH$TABLE followed by an ID. That table collects the information regarding the geo-coding job we executed against our customers dataset which were located starting from zip-codes and addresses. We can associate the table to the TEST_CUSTOMERS dataset with the following query against the SGTECH_OBJECTS metadata table.

  JSON_VALUE(data, '$.gcHelperTableName') DATASET  
 Oracle Spatial Studio

The SGTECH$TABLEA004AA549110B928755FC05F01A3EF89 table contains, as expected, a row for each customer in the dataset, together with the related geometry if the geo-coding was successful and some metadata flags like GC_ATTEMPTED, GC_STATUS and GC_MATCH_CODE stating the accuracy of the geo-coding match.

 Oracle Spatial Studio

What about all the analysis like the buffer and the customers within distance? For each analysis Spatial Studio creates a separate view with the SGTECH$VIEW prefix followed by an ID.

 Oracle Spatial Studio

To understand which view is referring to which analysis we need to query the metadata table SGTECH_OBJECTS with a query like

  JSON_VALUE(data, '$.tableName') DATASET  

With the following result

 Oracle Spatial Studio

We know then that the TEST_CUSTOMERS_WITHIN_DISTANCE can be accessed by the view SGTECH$VIEW0B2B36785A28843F74B58B3CCF1C51E3 and when checking its SQL we can clearly see that it executes the SDO_WITHIN_DISTANCE function using the TEST_CUSTOMERS.GC_GEOMETRY, the TEST_SHOPS columns LONGITUDE and LATITUDE and the distance=200 unit=KILOMETER parameters we set in the front-end.

CREATE OR replace force editionable view "SPATIAL_STUDIO"."SGTECH$VIEW0B2B36785A28843F74B58B3CCF1C51E3"
    "TEST_CUSTOMERS"   "t1",
    "TEST_SHOPS"       "t2"
    spatial_studio.sgtech_ptf("t2"."LONGITUDE", "t2"."LATITUDE"), 
    'distance=200 unit=KILOMETER'
    ) = 'TRUE';

Ok, we now understood which view contains the data, thus we can create a new view containing only the customers which are not within the 200km distance with

SELECT            AS id,
    t1.first_name    AS first_name,
    t1.last_name     AS last_name,         AS email,
    t1.gender        AS gender,
    t1.postal_code   AS postal_code,
    t1.street        AS street,       AS COUNTRY,          AS city,
    t1.studio_id     AS studio_id,
    t1.gc_geometry   AS gc_geometry
    test_customers t1
    id NOT IN (
Progressing in the Spatial Analysis

In the previous paragraph we created a view in the database named TEST_CUSTOMERS_NOT_WITHIN_DISTANCE containing the customer without a shop in a 200km radius. We can now import it into Spatial Studio by creating a new dataset, selecting the connection to the database (in our case named SPATIAL_STUDIO) as source and then the newly created TEST_CUSTOMERS_NOT_WITHIN_DISTANCE view.

 Oracle Spatial Studio

The dataset is added, but it has a yellow warning icon next to it

 Oracle Spatial Studio

Spatial Studio requests us to define a primary key, we can do that by accessing the properties of the dataset, select the Columns tab, choosing which column acts as primary key and validate it. After this step I can visualize this customer in a map.

 Oracle Spatial Studio

What's next? Well If I want to open a new shop, I may want to do that where there is a concentration of customers, which is easily visualizable with Spatial Studio by changing the Render Style to Heatmap.

 Oracle Spatial Studio

With the following output

 Oracle Spatial Studio

We can clearly see some major concentrations around Dallas, Washington and Minneapolis. Focusing more on Dallas, Spatial Studio also offers the option to simulate a new shop in the map and calculate the 200km buffer around it. I can clearly see that adding a shop halfway between Oklahoma City and Dallas would allow me to cover both clients within the 200km radius.

 Oracle Spatial Studio

Please remember that this is a purely demonstrative analysis, and some of the choices, like the 200km buffer are expressly simplistic. Other factors could come into play when choosing a shop location like the revenue generated by some customers. And here it comes the second beauty of Oracle Spatial Studio, we can export datasets as GeoJSON or CSV and include them in Data Visualization.

 Oracle Spatial Studio

For example I can export the data of TEST_CUSTOMERS_NOT_WITHIN_DISTANCE from Spatial Studio and include then in a Data Visualization Project blending them with the Sales related to the same customers.

 Oracle Spatial Studio

I can now focus not only on the customer's position but also on other metrics like Profit or Sales Amount that I may have in other datasets. For another example of Oracle Spatial Studio and Data Visualization interoperability check out this video from Oracle Analytics Senior Director Philippe Lions.


Spatial analytics made easy: this is the focus of Oracle Spatial Studio. Before spatial queries were locked down at database level with limited access from an analyst point of view. Now we have a visual tool with a simple GUI (in line with OAC) that easily enables spatial queries for everybody!

But this is only the first part of the story: the combination of capabilities achievable when mixing Oracle Spatial Studio and Oracle Analytics Cloud takes any type of analytics to the next level!

Categories: BI & Warehousing

KSQL in Football: FIFA Women’s World Cup Data Analysis

Rittman Mead Consulting - Mon, 2019-07-22 02:53

One of the football (as per European terminology) highlights of the summer is the FIFA Women’s World Cup. France, Brazil, and the USA are the favourites, and this year Italy is present at the event for the first time in 20 years.

From a data perspective, the World Cup represents an interesting source of information. There's a lot of dedicated press coverage, as well as the standard social media excitement following any kind of big event.

The idea in this blog post is to mix information coming from two distinct channels: the RSS feeds of sport-related newspapers and Twitter feeds of the FIFA Women’s World Cup. The goal will be to understand how the sentiment of official news related to the two teams involved in the final compares to that of the tweets.

In order to achieve our targets, we'll use pre-built connectors available in Confluent Hub to source data from RSS and Twitter feeds, KSQL to apply the necessary transformations and analytics, Google’s Natural Language API for sentiment scoring, Google BigQuery for data storage, and Google Data Studio for visual analytics.

Data sources

The beginning of our journey starts with connecting to various data sources. Twitter represents the default source for most event streaming examples, and it's particularly useful in our case because it contains high-volume event streaming data with easily identifiable keywords that can be used to filter for relevant topics.

Ingesting Twitter data

Ingesting Twitter data is very easy with Kafka Connect, a framework for connecting Kafka with external systems. Within the pre-built connectors we can find the Kafka Connect Twitter, all we need to do is install it using the Confluent Hub client.

confluent-hub install jcustenborder/kafka-connect-twitter:latest

To start ingesting the Twitter data, we need to create a configuration file containing the following important bits:

  • filter.keywords: We need to list all the keywords we are interested in, separated by a comma. Since we want to check tweets from the FIFA Women’s World Cup, we’ll use FIFAWWC, representing both the World Cup Twitter handle and the most common related hashtag.
  • kafka.status.topic: This topic that will be used to store the tweets we selected. twitter_avro: This is because the connector output format is AVRO.
  • twitter.oauth: This represents Twitter credentials. More information can be found on the Twitter’s developer website.

After the changes, our configuration file looks like the following:

twitter.oauth.accessToken=<TWITTER ACCESS TOKEN>
twitter.oauth.accessTokenSecret=<TWITTER ACCESS TOKEN SECRET>
twitter.oauth.consumerKey=<TWITTER ACCESS CUSTOMER KEY>
twitter.oauth.consumerSecret=<TWITTER CUSTOMER SECRET>

It's time to start it up! We can use the Confluent CLI load command:

confluent load twitter -d $TWITTER_HOME/

$TWITTER_HOME is the folder containing the configuration file. We can check the Kafka Connect status by querying the REST APIs with the following:

curl -s "http://localhost:8083/connectors/twitter/status" | jq [.connector.state] 

We can also check if all the settings are correct by consuming the AVRO messages in the twitter_avro topic with a console consumer:

confluent consume twitter_avro --value-format avro

And the result is, as expected, an event stream of tweets.

RSS feeds as another data source

The second data source that we'll use for our FIFA Women’s World Cup sentiment analytics are RSS feeds from sports-related newspapers. RSS feeds are useful because they share official information about teams and players, like results, episodes, and injuries. RSS feeds should be considered neutral since they should only report facts. For this blog post, we’ll use RSS feeds as a way to measure the average sentiment of the news. As per the Twitter case above, a prebuilt Kafka Connect RSS Source exists, so all we need to do is to install it via the Confluent Hub client:

confluent-hub install kaliy/kafka-connect-rss:latest

Then, create a configuration file with the following important parameters:

  • rss.urls: This is a list of space-separated RSS feed URLs. For our Women’s World Cup example, we’ve chosen the following sources: La Gazzetta dello Sport, Transfermarkt, Eurosport, UEFA, The Guardian, Daily Mail, The Sun Daily, BBC
  • topic: The Kafka topic to write to, which is rss_avro in our case

The full configuration file looks like the following:


And again, we can start the ingestion of RSS feeds with the Confluent CLI:

confluent load RssSourceConnector -d $RSS_HOME/

We can test the status of Kafka Connectors using this simple procedure, and calling it like:

RssSourceConnector  |  RUNNING  |  RUNNING
twitter             |  RUNNING  |  RUNNING

We can see that the both the RssSourceConnector and the twitter Connect are up and running. We can then check the actual data with the console consumer.

confluent consume rss_avro --value-format avro

Below is the output as expected.

Shaping the event streams

After ingesting the Twitter and RSS event streams into topics, it’s time to shape them with KSQL. Shaping the topics accomplishes two purposes:

  1. It makes the topics queryable from KSQL
  2. It defines additional structures that can be reused in downstream applications

The Twitter stream lands in Avro format with the fields listed in the related GitHub repo. We can easily declare a TWITTER_STREAM KSQL stream on top of TWITTER_AVRO with:


There is no need to define the single fields in the event stream declaration because they are already in AVRO and thus will be sourced from the Confluent Schema Registry. Schema Registry is the component within Kafka, in charge of storing, versioning and serving the topics Avro Schemas. When a topic is in AVRO format, its schema is stored in the Schema Registry, where downstream applications (like KSQL in this case) can retrieve it and use it to “shape” the messages in the topic.

The important bits of the above KSQL for our definition are:

  • KAFKA_TOPIC='TWITTER_AVRO': the definition of the source topic
  • VALUE_FORMAT='AVRO': the definition of the source topic format
  • TIMESTAMP='CREATEDAT': the Tweet's creation date, which is used as the event timestamp

We can now check that the fields’ definition has correctly been retrieved by the Schema Registry with:


Or, we can use the REST API by:

curl -X "POST" "http://localhost:8088/ksql" \
-H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" \
-d $'{
"streamsProperties": {}

The resulting fields section will be:

"fields": [
"name": "ROWTIME",
"schema": {
"type": "BIGINT",
"fields": null,
"memberSchema": null
"name": "ROWKEY",
"schema": {
"type": "STRING",
"fields": null,
"memberSchema": null
"name": "CREATEDAT",
"schema": {
"type": "BIGINT",
"fields": null,
"memberSchema": null
"name": "ID",
"schema": {
"type": "BIGINT",
"fields": null,
"memberSchema": null

The same applies to the RSS feed contained in rss_avro with:

create stream RSS_STREAM

The result will be:

ksql> describe RSS_STREAM;
Name                 : RSS_STREAM
Field   | Type
ROWTIME | BIGINT           (system)

We can also use the URL manipulation functions added in KSQL 5.2 to extract useful information from the LINK column with:


The result will be:

ksql> SELECT HOST, PATH, PROTOCOL, QUERY_TXT FROM RSS_STREAM_URL_DECODING LIMIT 5; | /sport/football/article-6919585/Paul-Scholes-backs-Manchester-United-spring-surprise-Barcelona.html | https | ns_mchannel=rss&ns_campaign=1490&ito=1490 | /sport/formulaone/article-6916337/Chinese-Grand-Prix-F1-race-LIVE-Shanghai-International-Circuit.html | https | ns_mchannel=rss&ns_campaign=1490&ito=1490 | /sport/football/article-6919403/West-Brom-make-approach-Preston-manager-Alex-Neil.html | https | ns_mchannel=rss&ns_campaign=1490&ito=1490 | /sport/football/article-6919373/Danny-Murphy-Jermaine-Jenas-fascinating-mind-games-thrilling-title-race.html | https | ns_mchannel=rss&ns_campaign=1490&ito=1490 | /sport/football/article-6919215/Brazilian-legend-Pele-successfully-undergoes-surgery-remove-kidney-stone-Sao-Paulo-hospital.html | https | ns_mchannel=rss&ns_campaign=1490&ito=1490
Limit Reached
Query terminated
Sentiment analytics and Google’s Natural Language APIs

Text processing is a part of machine learning and is continuously evolving with a huge variety of techniques and related implementations. Sentiment analysis represents a branch of text analytics and aims to identify and quantify affective states contained in a text corpus.

Natural Language APIs provide sentiment scoring as a service using two dimensions:

  1. Score: Positive (Score > 0) or Negative (Score < 0) Emotion
  2. Magnitude: Emotional Content Amount

For more information about sentiment score and magnitude interpretation, refer to the documentation.

Using Natural Language APIs presents various benefits:

  • Model training: Natural Language is a pre-trained model, ideal in situations where we don't have a set of already-scored corpuses.
  • Multi-language: RSS feeds and tweets can be written in multiple languages. Google Natural Language is capable of scoring several languages natively.
  • API call: Natural Language can be called via an API, making the integration easy with other tools.
Sentiment scoring in KSQL with user- defined functions (UDFs)

The Natural Language APIs are available via client libraries in various languages, including Python, C#, and Go. For the purposes of this blog post, we'll be looking at the Java implementation since it is currently the language used to implement KSQL user-defined functions (UDFs). For more details on how to build a UD(A)F function, please refer to How to Build a UDF and/or UDAF in KSQL 5.0 by Kai Waehner, which we'll use as base for the GSentiment class definition.

The basic steps to implementing Natural Language API calls in a UDF are the following:

  1. Add the JAR dependency in your project. If you are using Maven, you just need to add the following in your pom.xml

<dependency>: <groupId></groupId> <artifactId>google-cloud-language</artifactId> <version>1.25.0</version> </dependency>

  1. Create a new Java class called GSentiment.
  2. Import the required classes:

//KSQL UDF Classes import io.confluent.ksql.function.udf.Udf; import io.confluent.ksql.function.udf.UdfDescription; //Google NL Classes import; import; import; import; import;

  1. Define the GSentimen class and add the Java annotations @UdfDescription(name = "gsentiment", description = "Sentiment scoring using Google NL API") public class Gsentiment { ... }.
  2. Within the class, declare gsentiment as the method accepting a String text as input. As of now, UDFs can't return two output values, so we are returning the sentiment score and magnitude as an array of double. @Udf(description = "return sentiment scoring") public List<Double> gsentiment( String text) { ... }.
  3. Within the gsentiment method, invoke the Natural Language API sentiment and cast the result as an array. Since a UDF can return only one parameter currently, we need to pipe the sentiment score and magnitude into an array of two elements.
Double[] arr = new Double[2];   
try (LanguageServiceClient languageServiceClient = LanguageServiceClient.create()) {
Document document = Document.newBuilder()
AnalyzeSentimentResponse response = languageServiceClient.analyzeSentiment(document);
Sentiment sentiment = response.getDocumentSentiment();

catch (Exception e) {
arr[0]=(double) 0.0;
arr[1]=(double) 0.0;
return Arrays.asList(arr);

  1. As mentioned in How to Build a UDF and/or UDAF in KSQL 5.0, build an uber JAR that includes the KSQL UDF and any dependencies, and copy it to the KSQL extension directory (defined in the ksql.extension.dir parameter in
  2. Add an environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to the service account key that will be used to authenticate to Google services.
  3. Restart KSQL.

At this point, we should be able to call the GSentiment UDF from KSQL:

[0.10000000149011612, -0.10000000149011612]
[0.20000000298023224, 0.20000000298023224]
[0.5                , 0.10000000149011612]
[0.10000000149011612, 0.10000000149011612]
[0.0                , 0.0]
Limit Reached
Query terminated

As expected, the UDF returns an ARRAY of numbers. In order to get the sentiment score and magnitude in separated columns, we simply need to extract the relevant values:

0.20000000298023224 | 0.10000000149011612
0.30000001192092896 | 0.10000000149011612
0.800000011920929   | 0.800000011920929
0.0                 | 0.0
0.30000001192092896 | 0.30000001192092896
Limit Reached
Query terminated

However, we should note that Natural Language APIs are priced per API call and, in the above SQL, we are calling the API two times—one for each GSENTIMENT call. Therefore, the above SQL will cost us two API calls per document. To optimise the cost, we can create a new event stream TWITTER_STREAM_WITH_SENTIMENT, which will physicalize in Kafka the array.


Next, parse the sentiment SCORE and MAGNITUDE from the TWITTER_STREAM_WITH_SENTIMENT event stream:


With this second method, we optimize the cost with a single Natural Language API call per tweet. We can do the same with the RSS feeds by declaring:

Sink to Google BigQuery

The following part of this blog post focuses on pushing the dataset into Google BigQuery and visual analysis in Google Data Studio.

Pushing the data into BigQuery is very easy—just install the BigQuery Sink Connector with:

confluent-hub install wepay/kafka-connect-bigquery:latest

Next, configure it while applying the following parameters (amongst others):

  • topics: defines the topic to read (in our case RSS_STREAM_WITH_SENTIMENT_DETAILS and TWITTER_STREAM_WITH_SENTIMENT_DETAILS)
  • project: the name of the Google project that we’ll use for billing
  • datasets=.*=wwc: defines the BigQuery dataset name
  • keyfile=$GOOGLE_CRED/myGoogleCredentials.json: points to the JSON file containing Google's credentials (which in our case is the same file used in the Google Natural Language scoring)

Before starting the connector, we need to ensure the BigQuery dataset named wwc (as per configuration file) exists, otherwise, the connector will fail. To do so, we can log into BigQuery, select the same project defined in the configuration file, and click on CREATE DATASET. Then, we’ll need to fill in all the details (more information about the dataset creation in the Google documentation).

After creating the dataset, it’s time to start the connector with the Confluent CLI:

confluent load bigquery-connector -d $RSS_HOME/

If the Kafka sink works, we should see one table per topic defined in the configuration file, which are RSS_STREAM_WITH_SENTIMENT_DETAILS and TWITTER_STREAM_WITH_SENTIMENT_DETAILS in our case.

Of course, we can query the data from BigQuery itself.

Visual Analysis in Google Data Studio

To start analysing the data in Google Data Studio, simply connect to the related console and select “Blank Report.”

We’ll be asked which data source to use for the project. Thus, we need to set up a connection to the wwc dataset by clicking on CREATE NEW DATASOURCE and selecting BigQuery as the connection. Then, select the project, dataset, and table ( TWITTER_STREAM_WITH_SENTIMENT_DETAILS).

We can then review the list of columns, types, and aggregations,  adding the data source to the report.

Finally, we can start creating visualisations like tiles to show record counts, line charts for the sentiment trend, and bar charts defining the most used languages.

A more advanced visualisation like a scatterplot shows the most common hashtags and the associated average sentiment value.

Below is a map visualising the average sentiment by country.

Analysing and comparing sentiment scores

Now that we have the two streams of data coming from Twitter and RSS feeds, we can do the analysis in KSQL and, in parallel, visually in Google Data Studio. We can, for example, examine the average sentiment over a timeframe and check how one source sentiment score compares to the other.

On the 27th of June, the quarterfinal match between Norway and England was played, with the result being that England beat Norway 3–0. Let’s check if we can somehow find significant similarities in the sentiment scoring of our dataset.

Starting with the Twitter feed, we can check all the tweets including ENGLAND and NORWAY by filtering the related hashtag #NORENG. To obtain the team related overall score, I’m then assigning to each team all the tweets containing the country full name and aggregating the  SENTIMENTSCORE with the following SQL:


We can check the overall sentiment score associated with the two teams using:


The GROUP BY 1 is necessary since KSQL currently requires a GROUP BY clause when using aggregation functions like SUM. Since we don’t aggregate for any columns other than the window time, we can use the number 1 as fix aggregator for the total. The result of the above query is in line with the final score, with the winner (England) having an average sentiment score of 0.212, and the loser (Norway) having a score of 0.0979.

We can also look at the behaviour per hour with the TUMBLING KSQL windowing function:


The query yields the following result:

Thu 27 Jun 16 | 0.18409091 | 0.16075758
Thu 27 Jun 17 | 0.14481481 | 0.13887096
Thu 27 Jun 18 | 0.14714406 | 0.12107647
Thu 27 Jun 19 | 0.07926398 | 0.34757579
Thu 27 Jun 20 | 0.10077705 | 0.13762544
Thu 27 Jun 21 | 0.08387538 | 0.17832865

We can clearly see that towards match time (19:00 BST), the ENGLAND average score has a spike, in coincidence with England’s first goal in the third minute. We can see the same on the Line Chart in Google Data Studio.

We can do a similar exercise on top of the RSS feeds stream, but first, we need to somehow filter it to get only FIFA Women’s World Cup 2019 data, since the predefined connector is ingesting all the news from the RSS sources without a topic filter. To do so, we create a new stream filtering only contents containing WOMEN and CUP:


We can now analyse the overall RSS sentiment with:


As before, the SUM(SENTIMENTSCORE)/COUNT(SENTIMENTSCORE) is calculating the average. We can then calculate the sentiment average for the selected team. Taking the same example of ENGLAND and NORWAY used previously, we just declare a stream filtering the sentiment for the two nations. For example:


Then, we can analyse the separate scoring with:

WHERE ROWTIME > STRINGTODATE('2019-06-27', 'yyyy-MM-dd')

The result is an average sentiment of 0.0575 for Norway and 0.111 for England, again in line with the match result where England won 3–0.

We can also understand the variation of the sentiment over time by using KSQL windowing functions like TUMBLING:

WHERE ROWTIME >= STRINGTODATE('2019-06-27', 'yyyy-MM-dd')

This yields the following results:

Thu 27 Jun 17 | 0.12876364 | 0.12876364
Thu 27 Jun 18 | 0.24957054 | 0.24957054
Thu 27 Jun 19 | 0.15606978 | 0.15606978
Thu 27 Jun 20 | 0.09970317 | 0.09970317
Thu 27 Jun 21 | 0.00809077 | 0.00809077
Thu 27 Jun 23 | 0.41298701 | 0.12389610

As expected from this source, most of the scoring of the two countries are the same since the number of articles is limited and almost all articles mention both ENGLAND and NORWAY.

Strangely, as we can see in the graph above, the NORWAY sentiment score on the 27th of June at 11:00 pm GMT (so after the match ended) is much higher than the ENGLAND one.

We can look at the data closely with:

from NORWAY_ENGLAND where TIMESTAMPTOSTRING(ROWTIME,'dd/MM HH') = '27/06 23';      | 0.375974032  | 0.375974032   | null         | -0.45428572   | 0.449999988  | 0.449999988

We can see that  NORWAY is being associated with two articles: one from The Guardian with a positive 0.375 score and one from BBC with a positive 0.449 score. ENGLAND, on the other hand, is associated with another BBC article, having a negative -0.454 score.
We can also compare the hourly Twitter and RSS sentiment scores by creating two tables:

WHERE ROWTIME >= STRINGTODATE('2019-06-27', 'yyyy-MM-dd')

The key of both the tables is the window start date, as we can see from:

ksql> select rowkey from RSS_NORWAY_ENGLAND_TABLE limit 1;
1 : Window{start=1561557600000 end=-}

We can then join the results together with the following statement:


This yields the following result:

Thu 27 Jun 17 | 0.14481481 | 0.13887096 | 0.12876364 | 0.12876364
Thu 27 Jun 18 | 0.14714406 | 0.12107647 | 0.24957054 | 0.24957054
Thu 27 Jun 19 | 0.07926398 | 0.34757579 | 0.15606978 | 0.15606978
Thu 27 Jun 20 | 0.10077705 | 0.13762544 | 0.09970317 | 0.09970317
Thu 27 Jun 21 | 0.08387538 | 0.17832865 | 0.00809077 | 0.00809077

And the end result similarly in a Data Studio Line Chart

Interested in more?

If you’re interested in what KSQL can do, you can download the Confluent Platform to get started with the event streaming SQL engine for Apache Kafka. To help you get started, Rittman Mead provides a 30 day Kafka quick start package

This article was originally posted on the Confluent blog.

Categories: BI & Warehousing


Subscribe to Oracle FAQ aggregator - BI &amp; Warehousing