Anonymous Data: logging without DeviceID


#1

Hi. I am currently investigating whether we can use countly as an enterprise analytics solution for Web, Android, iOS, MacOS and Windows applications. It is vital for us that collected data is anonymous, so we cannot store DeviceIds that would enable anyone to link the stored data to a particular device. Clearly this will mean that we will not be able to use any of the user analytics, but we can still track how the apps are used, features accessed etc.

I have been doing some R&D with the MacOS SDK. One way of making the data anonymous is by hardcoding the deviceID to a constant string so all users have the same deviceId. In this case countly reports that there is only ever one user, which is to be expected. Are there any potential problems with this approach? Will having multiple concurrent sessions with the same DeviceID work? Do you have any other suggestions on how anonymity can be achieved?

Thanks

Andy Etheridge


#2

When dealing with concurrent sessions for same user, there are couple of problems arising:

  • things that are processed on session end will be messed up
  • session cooldown will not work, so SDKs that rely on it will be creating too much sessions (currently only Web SDK afaik)
  • sdk will need to add &ignore_cooldown=true on all end_session requests to prevent session cooldown messing up data

Just for brief explanation, session cooldown is required for cases, when we are not 100% if after some action session will end or not. For example in websites, user is leaving the page, but will he access another page on same website and continue session, or leave website completely. That's why web sdk sends end_session on each time leaving page, and if user starts another page in session cooldown time, its automatically extended.

I think OSX SDK is ok, as it ends session when app is closed, and you just need to add &ignore_cooldown=true on all end_session requests

Now about which parts will work or not:

  • Live plugin: will always be 1 user online and max 1 users online. The only thing you would be able to see, if someone is sending any requests now or not, but you won't be seeing how many users, etc.
  • Overview: all metrics apart from users should be correct
  • All sections under Analytics: will display correct session count, but incorrect users and new users data
  • Engagement -> User retention: won't work
  • Engagement -> User loyalty: won't work
  • Engagement -> Session frequency: won't work
  • Engagement -> Session duration: might have incorrect data, as its processed on session end
  • Engagement -> Views per session: might have incorrect data, as its processed on session end
  • Engagement -> Star rating: should work without problems
  • Engagement -> Slipping away users: won't work
  • Engagement -> Time of day plugin: should work without problems
  • Events: should work without problems
  • Messaging/Push: won't work
  • Drill: will work for events and sessions, but not possible to drill by user properties respectively, and users metric will be incorrect
  • Funnels: would not work, as one user could have started funnel and other ended, but for countly they all would be as one user
  • In App Purchases: would work apart from users metric
  • Crashes: would work, part from displaying correct users metrics, like affected users, etc
  • Users Profiles: would show one user and its profile page would load really slow due to too much events, etc for one user
  • Flows: would show some information, but it would probably be messed up due to concurrent sessions
  • Cohorts: no point of grouping users if you have only one
  • Attribution: would show clicks, but not installs
  • Management -> Email reports: would have incorrect user metrics
  • Management -> Alerts: user related alerts won't work

#3

I think more sections would work correctly if, device_id would be randomly generated per session. So basically it would not be persistently stored. App starts, you generate device_id, use it through whole session until app closed.
On next start generate new one.
Would that work with your Anonymous data policy?

That way some parts like Live plugin and Flow, as well as some sections of Engagement would show correct data.

But in this case, users would eventually overflow and could clog server, so they would need to be cleaned out periodically based on their first seen property, or for example, deleted on each end_session. Need to think more on this approach, but I think both things can be achieved with custom plugin.


#4

Hi. Thanks for your quick response.
Yes, an in-memory identifier that is regenerated on either login or app start might be acceptable.
What is the upper limit on number of users? We could limit the possible number of deviceids, e.g. deviceId could be random number from 0 to 100K ?

Thanks

Andy


#5

Yes, that would work too.
Depending on server, but problems may arise if users are in tens or hundred millions, after that there is a possibility database needs to be sharded, etc.
But such limit I think would work perfectly, with little chance of collision


#6

Hi, is it possible to set the DeviceId with the Windows SDK, as Andy has done for MacOS?
Thanks,
Kathy


#7

We can start working on it after the new release (18.06). Would that be good for you?


#8

Hi, I think I have found a way to achieve this - would the following be correct? I am calling it immediately before Countly.StartSession()
image


#9

Hi, that might work currently. And only on a few Windows sdk targets, but it isn't really a supported method.

In the next release that will not work anymore. But if all goes well, we should have a official method for changing device ID.


#10

Is this supported now?

I'm using the 18.10.0 version of the SDK, and the workaround mentioned above does not work any more. What is the new way of achieving this goal now?


#11

Hello,

Yes, that workaround has been removed, but you should be able to achieve what you want with the new calls.

You can either provide a custom device ID during init or change it afterwards.

More info on that can be found here:

If you still have any questions, let me know.


#12

Sorry, I didn't make myself very clear.

I am indeed able to set the DeviceID to an arbitrary value.

What I now want is for the device name to be set to that same value (or an empty string).

I'm talking about the value that shows up in the "Device" column on the user profile dashboard.
We cannot allow the machine name of our users to be tracked.

How do I accomplish this?