Mobile/Metrics/Redash
This page explains how to use Mozilla's Re:dash query and visualization system.
Contents
Overview
Re:dash is a data collaboration and visualization platform. The system can connect to a variety of datasources. Mozilla already connects several datasources (Presto, Crash-DB and Sync) to our instance. Telemetry data (UT and UI) are extracted via Spark, stored as Parquet files and exposed to Presto DB into Re:dash. If you want to query and analyze Mobile event data, make sure you are using "Presto" (the default) when making an SQL query.
Presto is a distributed SQL query engine designed to query large data sets. Browse the documentation to learn more about the SQL syntax in Presto.
Data Tables / Schemas
If you want to explore the exposed data tables in Re:dash, use this query:
show tables
If you want to examine the structure of a specific table, use this query:
describe <table-name>
Mobile telemetry data is split into a few tables: android_clients_v1
, android_events_v1
, android_addons_v1
and mobile_clients_v1
.
The android_clients_v1
table holds opt-in data about common, mostly stable, information about individual Firefox installs. Each install has a unique clientid
UUID.
clientid varchar unique identifier profiledate timestamp when the installation was created, based on the profile submissiondate timestamp when the telemetry record was received by the server creationdate timestamp when the telemetry record was generated by the client appversion varchar version of Firefox osversion bigint version of Android SDK locale varchar locale code defaultsearch varchar current default search engine name device varchar device identifier string arch varchar CPU architecture channel varchar channel name (partition) submission varchar submission as a string 'YYYYMMDD' (partition)
Some of these fields can change during the lifetime of a Firefox installation. Some can't. A record is created for each telemetry ping, so it is possible to track the history of a client over time. You can see the default search engine change or the OS get updated, for a given client.
The android_events_v1
table holds opt-in event data (UI Telemetry). Each event is a record.
clientid varchar unique identifier submissiondate timestamp when the telemetry record was received by the server ts bigint relative timestamp associated with the event in milliseconds action varchar name of the event action method varchar name of the event method extras varchar extra contextual hint associated with the event sessions varchar sessions associated with the event (JSON array) experiments varchar a/b experiments associated with the event (JSON array) channel varchar channel name (partition) submission varchar submission as a string 'YYYYMMDD' (partition)
The android_addons_v1
table holds opt-in add-on and lightweight theme data. This table only has data for clients that have either: at least one add-on installed, or an active lightweight theme. Clients that have no add-ons installed and have no active lightweight theme are not included in the table. Use a left join
to android_clients_v1
or android_events_v1
to create a recordset with all clients included, but nulls for clients not in the android_addons_v1
table.
clientid varchar unique identifier submissiondate timestamp when the telemetry record was received by the server addons varchar list of installed add-ons, or null (JSON array) lwt varchar active lightweight theme, or null channel varchar channel name (partition) submission varchar submission as a string 'YYYYMMDD' (partition)
The mobile_clients_v1
table holds opt-out data about common, mostly stable, information about individual Firefox installs. Each install has a unique clientid
UUID.
clientid varchar unique identifier submissiondate timestamp when the telemetry record was received by the server profiledate timestamp when the installation was created, based on the profile creationdate timestamp when the telemetry record was generated by the client geocountry varchar country determined via client IP during submission locale varchar locale code os varchar name of the OS ('Android' or 'iOS') osversion varchar version OS (SDK number on Android and full version string on iOS) appversion varchar version of Firefox device varchar device identifier string arch varchar CPU architecture defaultsearch varchar current default search engine name distributionid varchar ID of the distribution partner, or null experiments varchar a/b experiments associated with the client (JSON array) channel varchar channel name (partition) submission varchar submission as a string 'YYYYMMDD' (partition)
Understanding Event Data
Firefox collects event data using the UI Telemetry system. The documentation explains some of the details of using the system, but doesn't really cover how it all comes together. The best way to know if a probe exists for an event is to look in the source.
Probes are added, and removed, all the time, so keeping an updated list is harder than you think. Let's cover a few groups of probes to give you an idea of how this all works.
Loading URLs
Any event that involves loading a URL, should use the loadurl.1
event action. Since we can load URLs many different ways, there will be different methods and different extra properties, which provide more context.
action | method | extras | description |
---|---|---|---|
loadurl.1 | griditem | 0 | loaded from top sites 0-indexed tile |
loadurl.1 | listitem | bookmarks | loaded from a list of bookmarks |
loadurl.1 | listitem | history | loaded from a list of history items |
loadurl.1 | suggestion | engine.0 | loaded from the first search engine suggestion |
UI Actions
Any button, menu or context menu event should use the action.1
event type. Most UI action use the widget ID as the event extras property.
action | method | extras | description |
---|---|---|---|
action.1 | menu | reload | choose reload from main menu |
action.1 | contextmenu | home_remove | remove an item from a home panel, probably a listitem |
action.1 | actionbar | select_all | select all text from the text edit actionbar |
action.1 | button | find_next | find next button |
Useful Queries
This guide doesn't try to teach SQL, but assumes you have a basic understanding.
Looking at Clients
Find the profile creation date for clients:
select distinct profiledate, clientid from android_clients_v1 where profiledate is not null
We use distinct
here to force a recordset with one clientid
and profiledate
per row, without duplicates.
Looking at Events
You want to look at basic UI Telemetry over the last 7 days, like the UI Telemetry dashboard:
select submissiondate as date, channel, action, method, extras, count(*) as count from android_events_v1 where submissiondate > current_date - interval '7' day group by 1, 2, 3, 4, 5
Same, but only those events that happened during a firstrun
session:
select submissiondate as date, channel, action, method, extras, count(*) as count from android_events_v1 where submissiondate > current_date - interval '7' day and sessions like '%firstrun%' group by 1, 2, 3, 4, 5
Let's breakdown loadurl.1
events into some subgroups over the last 7 days:
select submissiondate as date, count(action) as allloads, sum(case when extras = 'bookmarks' then 1 else 0 end) as bookmarks, sum(case when extras = 'reading_list' then 1 else 0 end) as readinglist, sum(case when extras = 'top_sites' then 1 else 0 end) as topsites, sum(case when extras = 'history' then 1 else 0 end) as history, sum(case when extras = 'frecency' then 1 else 0 end) as frecency, sum(case when method = 'actionbar' and extras = 'user' then 1 else 0 end) as user_typed, count(distinct clientid) as users from android_events_v1 where submissiondate > current_date - interval '7' day and action = 'loadurl.1' group by 1
Finding User Sessions
We don't have an explicit way to determine individual session of user activity in the application. It's pretty hard to have explicit "markers" because the real world tries to corrupt single events like a marker. Other systems use gaps in the flow of events to indicate breaks in user sessions. Google Analytics uses a 30 minute gap, but it really depends on the expected use of the application. Here is how we'd use our events to find a stream of user sessions:
select submissiondate, global_session_id, max(ts) - min(ts) as session_length, count(ts) as session_events from ( select clientid, submissiondate, ts, sum(is_new_session) over (order by clientid, submissiondate, ts) as global_session_id, sum(is_new_session) over (partition by clientid order by submissiondate, ts) as user_session_id from ( select *, -- let's use 5 minute gaps as an indicator of a session break case when (ts - last_ts) >= (1000 * 60 * 5) or last_ts is null then 1 else 0 end as is_new_session from ( select clientid, submissiondate, ts, lag(ts, 1) over (partition by clientid, submissiondate order by ts) as last_ts from android_events_v1 where submissiondate > current_date - interval '7' day and channel = 'beta' ) t ) f ) s group by 1, 2
Resources
Turns out that many other companies use event data and SQL, so we can benefit from what they have already done. Here are some blogs and posts that might help you build some SQL queries.
- Periscope Data has a blog with tons of posts on using SQL and event data for analyzing retention, churn and active users. They also cover some general SQL topics too.
- Mode Analytics covers using SQL on event data, but has a lot of general analysis posts too.
- Segment doesn't cover as much SQL as others, but does have posts on the process of data collection and analysis. Worth taking a look.
- Treasure Data has a FAQ post on techniques for optimizing Presto queries.