Google Cloud DataStore. How to serve data?
Like many, I'm no new the NoSQL world. I did a lot of research, but I still lack only one point, which I can't find proper answer for. Short description of system: I'm building a system that collects Visitor's data on different websites. Each visit is an Entity in the datastore, with properties like device type, IP, time of visit..etc. There will be millions of visits in the datastore. My Question, is how do I serve this data to clients. My Data is setting in the datastore as "Visit" entities. Now when a customer logs in, I don't want to show them millions of records. I want for example to show them general stats. Like number of visits on mobile device, number of visits from specific country in some time range, and stuff like that. Now since I'm new to the NoSQL databases, I'm not sure how I should go around showing these stats in the clients' dashboard. As I know, Datastore has no support for aggregates, or getting count of query results for example. I looked at BigQuery, but BigQuery works on Datastore "backups", I need to serve data in real time, without needing to do backups manually. Also I read about counters, and sharding counters, is this the proper approach? have a counter for each client for each property for each tracking group? and show the total numbers this way? Sounds like too much for a simple purpose. Any input or explanation that can get me in the right direction would be highly appreciated. Best Regards
Yes, counters are a good approach to your problem in terms of performance. They do have some downsides though, such as storage size and the fact that each time you would like to introduce a new type of statistic, you would need to create a counter for it. In addition to your current "Visit" entities, you could opt for storing the aggregated data in Sharded Counters in the Datastore. These counters can be updated in real-time, or via a Task in one of your task queues. It would be fairly straight-forward to create a Task that would create the various counters for the current Visit entities. Sharding is a way of creating multiple "underlying" entities that, when combined, represent some meaningful data. Sharding is done to ensure that there are no performance issues due to concurrent updates. From the Google Documentation: If you had a single entity that was the counter and the update rate was too fast, then you would have contention as the serialized writes would stack up and start to timeout. The way to solve this problem is a little counter-intuitive if you are coming from a relational database; the solution relies on the fact that reads from the App Engine datastore are extremely fast and cheap. The way to reduce the contention is to build a sharded counter – break the counter up into N different counters. When you want to increment the counter, you pick one of the shards at random and increment it. When you want to know the total count, you read all of the counter shards and sum up their individual counts. The more shards you have, the higher the throughput you will have for increments on your counter. This technique works for a lot more than just counters and an important skill to learn is spotting the entities in your application with a lot of writes and then finding good ways to shard them. I would recommend having a look at the link for further information and some helpful examples.
As I know, Datastore has no support for aggregates, or getting count of query results for example. This is not true. You can get a number of entities returned by a query with one line of code. The query itself can be keys-only, which is very fast and basically free.
Java App engine backend shuts down abruptly, how to resume work?
Google Plus Domains Api HTTP Request - add people fails
Get entities containing specific value - objectify
Google AppEngine ClientId and Client Secrets
Which SDK Version Does Appengine Use in Production
can not figure out relation between yaml and main page handler in google app engine
NoClassDefFoundError when adding new font in iText on AppEngine
Facebook login in Google Cloud Endpoints
To share a local host for go gae?
App-engine: JAX-RS with Jersey no working
Bi-directional one-to-many relationship in google app engine using JPA
How to check if field with value None is stored in datastore or not stored at all?
DeadlineExceededError in self.response.write
Google Drive invalid credentials
Google App Engine endpointscfg.py command starting 1.8.6 does not accept argument -f
Google Checkout Order Report API -> Google Wallet analog?