google-app-engine


Google Cloud DataStore. How to serve data?


Like many, I'm no new the NoSQL world. I did a lot of research, but I still lack only one point, which I can't find proper answer for.
Short description of system:
I'm building a system that collects Visitor's data on different websites. Each visit is an Entity in the datastore, with properties like device type, IP, time of visit..etc.
There will be millions of visits in the datastore.
My Question, is how do I serve this data to clients. My Data is setting in the datastore as "Visit" entities.
Now when a customer logs in, I don't want to show them millions of records. I want for example to show them general stats. Like number of visits on mobile device, number of visits from specific country in some time range, and stuff like that.
Now since I'm new to the NoSQL databases, I'm not sure how I should go around showing these stats in the clients' dashboard.
As I know, Datastore has no support for aggregates, or getting count of query results for example.
I looked at BigQuery, but BigQuery works on Datastore "backups", I need to serve data in real time, without needing to do backups manually.
Also I read about counters, and sharding counters, is this the proper approach? have a counter for each client for each property for each tracking group? and show the total numbers this way? Sounds like too much for a simple purpose.
Any input or explanation that can get me in the right direction would be highly appreciated.
Best Regards
Yes, counters are a good approach to your problem in terms of performance. They do have some downsides though, such as storage size and the fact that each time you would like to introduce a new type of statistic, you would need to create a counter for it.
In addition to your current "Visit" entities, you could opt for storing the aggregated data in Sharded Counters in the Datastore. These counters can be updated in real-time, or via a Task in one of your task queues. It would be fairly straight-forward to create a Task that would create the various counters for the current Visit entities.
Sharding is a way of creating multiple "underlying" entities that, when combined, represent some meaningful data. Sharding is done to ensure that there are no performance issues due to concurrent updates.
From the Google Documentation:
If you had a single entity that was the counter and the update rate
was too fast, then you would have contention as the serialized writes
would stack up and start to timeout. The way to solve this problem is
a little counter-intuitive if you are coming from a relational
database; the solution relies on the fact that reads from the App
Engine datastore are extremely fast and cheap. The way to reduce the
contention is to build a sharded counter – break the counter up into N
different counters. When you want to increment the counter, you pick
one of the shards at random and increment it. When you want to know
the total count, you read all of the counter shards and sum up their
individual counts. The more shards you have, the higher the throughput
you will have for increments on your counter. This technique works for
a lot more than just counters and an important skill to learn is
spotting the entities in your application with a lot of writes and
then finding good ways to shard them.
I would recommend having a look at the link for further information and some helpful examples.
As I know, Datastore has no support for aggregates, or getting count
of query results for example.
This is not true. You can get a number of entities returned by a query with one line of code. The query itself can be keys-only, which is very fast and basically free.

Related Links

Java App engine backend shuts down abruptly, how to resume work?
Google Plus Domains Api HTTP Request - add people fails
Get entities containing specific value - objectify
Google AppEngine ClientId and Client Secrets
Which SDK Version Does Appengine Use in Production
can not figure out relation between yaml and main page handler in google app engine
NoClassDefFoundError when adding new font in iText on AppEngine
Facebook login in Google Cloud Endpoints
To share a local host for go gae?
App-engine: JAX-RS with Jersey no working
Bi-directional one-to-many relationship in google app engine using JPA
How to check if field with value None is stored in datastore or not stored at all?
DeadlineExceededError in self.response.write
Google Drive invalid credentials
Google App Engine endpointscfg.py command starting 1.8.6 does not accept argument -f
Google Checkout Order Report API -> Google Wallet analog?

Categories

HOME
shell
delphi-7
deep-learning
codenvy
forms
redux-form
alexa-skills-kit
case
jelastic
access
crystal-reports-2013
banner
angularfire2
odoo-8
jquery-ui-sortable
openstreetmap
xcodebuild
shared-libraries
heat
standards
red5
easyphp
nim
ember-cli
dataflow-diagram
tosca
google-cloud-print
maze
rpgle
panoramas
mobx
kudu
quickfixj
git-squash
installshield-le
activexobject
dynamic-jasper
mnist
sharepoint-workflow
xcode7.3
sessionstorage
nohup
p4merge
urlsession
skylink
wmp
smoothstate.js
placeholder
memory-mapping
procfile
phpdbg
vaadin-elements
mesosphere
jboss-4.2.x
proget
magic-draw
pair-programming
stdmap
compiler-design
java-websocket
flyout
chromium-os
tinkerpop3
android-dateutils
boost-serialization
aspen
cycle2
libvlc
inf
django-urls
ons-api
coda
sony-lifelog-api
viola-jones
rdflib
keyup
matlab-deployment
adehabitathr
variadic-templates
rapidsvn
sonarqube-5.0
machine-instruction
paw
ora-00904
grails-domain-class
xml-dsig
spring-remoting
renderer
clickbank
flex++
joomla-template
composite-component
sentestingkit
qtconcurrent
boost-signals
cgimage
urlrewriter
promotion-code
timthumb
codebase
diagrams
objective-j
interprocess
nstokenfield
fail-fast-fail-early
dsoframer
commercial-application

Resources

Encrypt Message