Data Info and Missing Data

Android Phone Sensors Missing Data

Discussion around peculiarities of Android phone sensors, data incompleteness, known issues, work-arounds etc.

See page on phone sensors data for some documentation of the sensors and the specification documents in github:

Values not recorded if unchanged

There are two data streams, the light sensor and battery level, that only record data if the value changes. Therefore the data is collected in very irregular intervals.

Night time values may get limited by Android

There is some indication that over the night Android makes assumptions around sensor activity variability and may limit the data collected.

See issue raised in relation to the Light and Battery sensors on Android.

Details of phone sensor details for pRMT

Environment Sensor → Light

Other → Battery 

Other Android limitations

Android may be more aggressively limiting phone sensor data collection

RSD-13 Text Messages Contacts Status can be NULL

schemas  # value is unknown if contact is unknown, don't assume that it is false.

it is not allowed to determine if OUTGOING messages ids are contacts (as far as we are aware, seems to be an API limitation)

see: source code

you see the value is not initialized for outgoing 

Possible Solutions:

Some of the contacts will later be know because the same hash value is seen in INCOMING, it may at that point be possible to add populate a TRUE value into some of the messages that previously were OUTGOING. See the image attached to the issue


Sleep stages Unknown

a bug in the sleep stage mapping, where the "wake" sleep stage is mapped to "UNKNOWN". So during analysis, any UNKNOWN sleep stage can be mapped to AWAKE.


The android_phone_usage_event data is sent by the passive app that does not know the categories of the apps and hence its empty.
This data is then processed and app categories are added from the playstore in the backend and output the data to android_phone_usage_event_output.
Both of them use the same schema.

The android_phone_usage_event_aggregated uses this android_phone_usage_event_output data as input and creates windowed phone usage events with duration of usage of each app.
There was a bug that caused the file name unknown_date.csv in android_phone_usage_event_aggregated data but it is fixed now so the files should be name by time now.
Also another point, the android_phone_usage_event_aggregated and android_phone_usage_event_output depends on streams application on the server which is stopped due to resource constraints on the backend. So this data may not be updated anymore. But you can easily calculate this data from the android_phone_usage_event and playstore HTML parsing.

As for the eventType it will only be "Foreground" or "Background" as stated in the schema doc. The others are added just in case if needed in the future.

For the user interaction these are as suggested by their names -

  • "STANDBY" : The phone is on and locked and on standby (not being used)
  • "UNLOCKED" : The event denoting the act of unlocking the phone from the lock screen.
  • "SHUTDOWN" : The event denoting the act of shutting down the phone either by the user or automatically (eg dead battery, etc)
  • "BOOTED" : The event denoting the act of Switching ON the phone (from a Shutdown state).
  • "OTHER" : Any other event not denoted by the above types.

TIMEZONE Information

The time in topic "application_time_zone" is provided by the pRMT app while the timezone in "connect_fitbit_time_zone" is provided by Fitbit.
There is also a third timezone topic called “questionnaire_timezone” which is provided with every completed questionnaire from aRMT app. 

As mentioned here , The application_time_zone data is only sent on updates. 
It maybe worth combining the 3 and using the data which ever is available at a particular timepoint. App's timezone data should be preferred instead of Fitbit as it may be stale/ outdated (Fitbit does not sync as regularly as pRMT)
But it also depends on the source of the data.
For example For Fitbit data, using the Fitbit_time_zone makes more sense, For Questionnaire data, using questionnaire_timezone makes more sense and so on.