MongoDB Aggregation count over a relation |
You don't say how orders relate to buildings in your schema but if an order
has a building id or name it references, just group by that:
db.orders.aggregate( { $group : { _id: "$buildingId",
sum: {$sum:1}
}
},
/* $sort by sum:-1, $limit:10 like you already have */
)
|
How to count Booleans in MongoDB with aggregation framework |
The $project phase is your friend in the pipeline by allowing you to create
new fields which have different types and values than original fields.
Consider this projection which uses $cond to use one value when something
is true and another one when it's false:
{ $project : { numWhoOwnHome : { $cond : [ "$OwnsAHome", 1, 0 ] } } }
If you now do a $group with {$sum : "$numWhoOwnHome"} your result will be
the number of people who had OwnsAHome set to true.
|
mongoDB - aggregation on aggregation? |
Here's how to do it. The key is that (a) you need to do $group twice and
(b) you need to first $group by the thing you want to sub-total and then
$group to get totals.
db.records.aggregate(
{$group:
{_id : {d:"$device",c:"$carrier"},
subtotal:{$sum:1}}
},
{$group:
{_id:"$_id.c",
devices:{$push:{device:"$_id.d", subtotal:"$subtotal"}},
total:{$sum:"$subtotal"}}
}
)
|
Perform a query with aggregation and grouping by joined tables |
Just do
SELECT 'DOMESTIC_CAT', c.color, count(*)
FROM domestic_cat d INNER JOIN cat c ON c.id = d.cat_id
GROUP BY c.color
UNION ALL
SELECT 'PERSIAN_CAT' .... the same for the other table
|
How to perform a single aggregation task on GAE once a given set of fan-out tasks complete |
Waiting for each batch to complete makes your process much more serial - it
will take longer to run that way.
If high numbers of varargs are a problem, as a workaround you could have a
fan-in task corresponding to each fan-out, assuming fan-out doesn't branch
more than about 10 tasks at a time.
|
is it possible to use "$where" in mongodb aggregation functions |
MongoDB doesn't support $where in aggregation pipeline and hope this will
never happen, because JavaScript slows things down. Never the less, you
still have options:
1) Мaintain additional field(e.g. app_name_len) than will store app_name
length and query it, when needed.
2) You can try extremely slow MapReduce framework, where you allowed to
write aggregations with JavaScript.
|
MongoDB Using Map Reduce against Aggregation |
Take a look to this.
The Aggregation FW results are stored in a single document so are limited
to 16 MB: this might be not suitable for some scenarios. With MapReduce
there are several output types available including a new entire collection
so it doesn't have space limits.
Generally, MapReduce is better when you have to work with large data sets
(may be the entire collection). Furthermore, it gives much more flexibility
(you write your own aggregation logic) instead of being restricted to some
pipeline commands.
|
MongoDB Aggregation Framework |
Your schema as designed would make using anything but a MapReduce difficult
as you've used the keys of the object as values. So, I adjusted your schema
to better match with MongoDB's capabilities (in JSON format as well for
this example):
{
'_id' : 'star_wars',
'count' : 1234,
'spellings' : [
{ spelling: 'Star wars', total: 10},
{ spelling: 'Star Wars', total : 15},
{ spelling: 'sTaR WaRs', total : 5} ]
}
Note that it's now an array of objects with a specific key name, spelling,
and a value for the total (I didn't know what that number actually
represented, so I've called it total in my examples).
On to the aggregation:
db.so.aggregate([
{ $unwind: '$spellings' },
{ $project: {
'spelling' : '$spellings.spelling',
'total': '$spellings.tot
|
mongodb aggregation addToSet with sum |
I got it to work with this pipeline.
var unwind = { "$unwind" : "$ids" };
var group = {
"$group" : {
"_id" : {
"account" : "$account",
"id" : "$ids.id"
},
"idSalesAmount" : {
"$sum" : "$ids.salesAmount"
},
"idSalesCount" : {
"$sum" : "$ids.salesCount"
}
}};
var group2 = {
"$group" : {
"_id" : "$_id.account",
"ids" : {
"$addToSet" : {
"id" : "$_id.id",
"salesAmount" : "$idSalesAmount",
"saleCount" : "$idSalesCount"
}
}
}};
var project = { "$project" : { "account" : "$_id", "ids" : 1, "_id" : 0 }
};
db.mytest.aggregate([unwind, group, group2, project])
|
Absolute value with MongoDB aggregation framework |
It's not directly available, but you can do it using a $cond operator and a
$subtract within a $project like this (as a JavaScript object):
{ $project: {
amount: {
$cond: [
{ $lt: ['$amount', 0] },
{ $subtract: [0, '$amount'] },
'$amount'
]
}}}
So if amount < 0, then 0 - amount is used, otherwise amount is used
directly.
UPDATE
As of the 3.2 release of MongoDB, you can use the new $abs aggregation
expression operator to do this directly:
{ $project: { amount: { $abs: '$amount' } }
|
Mongodb Aggregation Framework and timestamp |
Here is a way you can do it by generating the aggregation pipeline
programmatically:
numberOfMonths=24; /* number of months you want to go back from today's
*/
now=new Date();
year=now.getFullYear();
mo=now.getMonth();
months=[];
for (i=0;i<numberOfMonths;i++) {
m1=mo-i+1; m2=m1-1;
d = new Date(year,m1,1);
d2=new Date(year,m2,1);
from= d2.getTime()/1000;
to= d.getTime()/1000;
dt={from:from, to:to, month:d2}; months.push(dt);
}
prev="$nothing";
cond={};
months.forEach(function(m) {
cond={$cond: [{$and :[ {$gte:["$_id",m.from]}, {$lt:["$_id",m.to]}
]}, m.month, prev]};
prev=cond;
} );
/* now you can use "cond" variable in your pipeline to generate month */
db.collection.aggregate( { $project: { month: cond , value:1 } },
|
MongoDB - Aggregation Framework, PHP and averages |
As Asya said, the aggregation framework isn't usable for the last part of
your problem (averaging gaps in "hits" between documents in the pipeline).
Map/reduce also doesn't seem well-suited to this task, since you need to
process the documents serially (and in a sorted order) for this computation
and MR emphasizes parallel processing.
Given that the aggregation framework does process documents in a sorted
order, I was brainstorming yesterday about how it might support your use
case. If $group exposed access to its accumulator values during the
projection (in addition to the document being processed), we might be able
to use $push to collect previous values in a projected array and then
inspect them during a projection to compute these "hit" gaps.
Alternatively, if there was some facility
|
Mongodb Aggregation $group, $sum and $sort |
The solution was:
dir.aggregate(
[
{ $group:
{_id: {fecha:"$date", hora: "$start"},
llamadas :{$sum:"$total_calls"},
answer :{ $sum:"$answer_calls"},
abandoned: {$sum:"$abandoned_calls"},
mail: {$sum:"$voicemail_calls"} }
},
{ $sort:
{'_id.fecha':1 , '_id.hora':1} }
] )
Thank you again to Sammaye and JohnnyHK
|
MongoDB Aggregation Multiple Keys |
mapReduce can be used to solve the problem.
1) define the following map function
var mapFunction = function() {
var key = this.department;
var nb_match_bar2 = 0;
var status_exist = 0;
var status_absent = 0;
if( this.status=="exist" ){
status_exist = 1;
}else{
status_absent= 1;
}
var value = {
department: this.department,
statuses:{
exist: status_exist,
absent: status_absent
}
};
emit( key, value );
};
2) define the reduce function
var reduceFunction = function(key, values) {
var reducedObject = {
department: key,
statuses: {
exist: 0,
absent:0
}
};
values.forEach( function(value) {
reducedObject.statuses.exist += value.statuses.exist;
reducedObject.statuses.absent += value.statuses.absent;
}
|
Mongodb aggregation not working with mongoose |
Your query seems to be formatted correctly, I think you've just projected
"contacts" when you should have projected "list". I tried to format my
data like yours, and the following queries worked for me. In the shell:
db.accounts.aggregate(
{ $unwind:"$contacts" },
{ $group: {
_id: '$_id',
list: { $push:'$contacts.contactId' }
}
},
{ $project: { _id: 0, list: 1 }} )
or, using the mongoose framework,
Account.aggregate(
{ $unwind:"$contacts" },
{ $group: {
_id: '$_id',
list: { $push:'$contacts.contactId' }}},
{ $project: {
_id: 0,
list: 1 }},
function (err, res) {
if (err) //handle error;
console.log(res);
}
);
Since you've tried to suppress the "_id" field in the final output o
|
MongoDB Aggregation $group and categorise |
Damo, one thing that you must keep in mind is that when you want to group
by a value, you probably have to use $cond operator.
db.esbtrans.aggregate({
$group : {
_id : "$messageflow",
errors : { $sum : { $cond : [ { $eq : ["$status", "ERR"] } ,1,0] }
},
successes : { $sum : { $cond : [ { $eq : ["$status", "OK"] } ,1,0]
} },
}
})
Explaining:
I group by messageflow because this field is your basic axis. Then to count
the number of erros and successes, I use the $sum operator in combination
with $cond and $eq. It just compares if status is ERR or OK and sum
properly.
|
State-dependent aggregation in MongoDB |
I'm going to answer my own question with the solution I ended implementing.
I realized that I was really interested in the previous state of each
document. In my case documents are inserted in large batches in temporal
order. So, I simply created a state_prev field and a delta field (the
difference between sequential documents' timestamp values).
{
timestamp: Number,
state: Number,
state_prev: Number,
delta: Number
}
I'm now able to $sum the new delta field and $group by the state_prev field
to achieve my desired aggregate computation.
|
MongoDB - Aggregation, group by an array value |
Unfortunately, that isn't possible using aggregation with your schema. The
problem is that aggregation is meant to operate over values in an array
that are being selected by the $group clause and those elements have all
the data needed. Your setup separates what you want to group by and what
you want to sum. You could use a mapReduce job to do what you want with
your schema. http://docs.mongodb.org/manual/core/map-reduce/ should be able
to get you started.
Let me know if you have any other questions.
Best,
Charlie
|
MongoDB aggregation framework match OR |
$match: { $or: [{ author: 'dave' }, { author: 'john' }] }
Like so, since the $match operator just takes what you would normally put
into the find() function
|
Mongodb query help: aggregation within json |
You might want to try:
db.coll.aggregate( [
{ $match: { Date: /20130202/ } },
{ $group: { _id: null,
sport: { $sum: "$category.sport" },
national: { $sum: "$category.national" },
international: { $sum: "$category.international" },
finance: { $sum: "$category.finance" },
others: { $sum: "$category.others" },
tech: { $sum: "$category.tech" },
Music: { $sum: "$category.Music" }
} }
] )
|
Array de-aggregation with repetition in mongodb |
You should be able to do this with a simple $unwind
For your example above you can use:
db.current.aggregate({$unwind: "$longcollection"})
This will give you a result like this:
{
result: : [
{
"_id" : ObjectId(...),
"name": xxxx,
"othervar": yyyyy,
"longcollection" : {
"first": 1,
"second":2
}
},
{
"_id" : ObjectId(...),
"name": yyyy,
"othervar": zzzz,
"longcollection" : {
"first": 3,
"second":4
}
}],
"ok" : 1
}
To stop the duplicate _id message seen in the comment you should be able to
use:
db.current.aggregate({$project : {_id: 0, name: 1, othervar:
|
MongoDB double $group aggregation |
Try this pipeline:
[
{$unwind:"$results"},
{$match: {"results.discipline":{$in:["football", "basketball"]}}},
{$group{_id:{player_id:"$player_id",league_id:"$league_id"},
'average':{$avg:"$results.score"}}}
]
it works for me with your doc:
{
"result" : [
{
"_id" : {
"player_id" : 0,
"league_id" : 2
},
"average" : 23.195
}
],
"ok" : 1
}
UPD. If you want to group again, by league_id:
[{$unwind:"$results"},
{$match: {"results.discipline":{$in:["football", "basketball"]}}},
{$group:{_id:{player_id:"$player_id",league_id:"$league_id"},
'average':{$avg:"$results.score"} }},
{$group:{_id:"$_id.league_id", 'average':{$avg:"$average"} }} ]
{ "result" : [ { "_id" : 2, "average" : 23.195 } ], "ok" : 1 }
|
R- Perform operations on column and place result in a different column, with the operation specified by the output column's name |
It seems strange that you would store your operations in your column names,
but I suppose it is possible to achieve:
As always, sample data helps.
## Creating some sample data
mydf <- setNames(data.frame(matrix(1:9, ncol = 3)),
c("L1", "L2", "L3"))
## The operation you want to do...
morecols <- c(
combn(names(mydf), 2, FUN=function(x) paste(x, collapse = "+")),
combn(names(mydf), 2, FUN=function(x) paste(x, collapse = "-"))
)
## THE FINAL SAMPLE DATA
mydf[, morecols] <- NA
mydf
# L1 L2 L3 L1+L2 L1+L3 L2+L3 L1-L2 L1-L3 L2-L3
# 1 1 4 7 NA NA NA NA NA NA
# 2 2 5 8 NA NA NA NA NA NA
# 3 3 6 9 NA NA NA NA NA NA
One solution could be to use eval(parse(...)) within lapply to perform the
calculati
|
Mongodb Aggregation framework group and sort |
You can do this:
db.collection.aggregate(
{$sort:{"time":1}},
{ $group:
{ _id: "$sessionId",
messages: { "$push": {message: "$msg", time: "$time"} }
}
}
)
This will sort the collection based on time then group by session id. Each
session ID group will have an array of sub-documents which contain the
message and time of the message. By sorting then pushing the messages will
be ordered by time in your messages array.
|
How to do Mongodb aggregation arithmetic operator in Java? |
Note that the value of $multiply operator should be an array not an object.
So, in Java the code will be:
BasicDBList args = new BasicDBList();
args.add(myField);
args.add(0);
new BasicDBObject("$multiply", args)
|
MongoDB geoNear Aggregation - Order of Operations |
Here's how $geoNear works: It gets a cursor to documents whose coordinates
satisfy the maxDistance requirement. It then iterates over the cursor and
for each document checks if it matches the query requirement. If it
doesn't, it skips it and moves to the next document. It does this until it
finds limit-many documents or the end of the cursor. Note that this is the
limit argument to the $geoNear command, not the $limit operation specified
later in the aggregation pipeline.
The default limit is 100, so if you don't specify limit you are getting the
first 100 documents that match query and whose coordinates satisfy
maxDistance, sorting those 100 documents by created_at, and then taking the
first 5. When you specify limit:100000, you are getting the first 100000
documents that match query and
|
MongoDB aggregation of large amounts of data |
This projection is not a big deal, it has minor impact on whole execution
complexity. You can make simple tests with and without this step to find
concrete numbers for your case, but, as I said, it just one additional step
for Aggregation framework.
If you grouping by date, this post might be helpful
|
MongoDB Aggregation: Counting distinct fields |
I figured this out by using the $addToSet and $unwind operators.
Mongodb Aggregation count array/set size
db.collection.aggregate([
{
$group: { _id: { account: '$account' }, vendors: { $addToSet:
'$vendor'} }
},
{
$unwind:"$vendors"
},
{
$group: { _id: "$_id", vendorCount: { $sum:1} }
}
]);
Hope it helps someone
|
MongoDB Aggregation: How do I recombine a date using $project? |
Assuming that, as you are grouping documents by year, month and day, hours
and minutes are useless, you can use one of those operators to get a date
sample: $first, $last, $min or $max.
Sentiment.aggregate([
{
$match: { 'content.term' : term_id }
}, {
$group: {
_id: {
year : { $year : '$created_at' },
month : { $month : '$created_at' },
dayOfMonth : { $dayOfMonth : '$created_at' },
},
dt_sample : { $first : '$created_at' },
sum : { $sum : '$score'},
count : { $sum : 1 }
},
}, {
$project: {
_id : 0,
date : '$dt_sample',
sum : 1,
count : 1,
avg : { $di
|
How to add a new field in aggregation in projection with blank value in mongodb |
The syntax fieldname:1 means "pass through this field as is.
You want to have a literal 1 value - the simplest way is to create an
expression that will return 1. I suggest: dpv:{$add:[1]}
|
How to overcome the limitations with mongoDB aggregation framework |
1) Saving aggregated values directly to some collection(like with
MapReduce) will released in future versions, so first solution is just wait
for a while :)
2) If you hit 2-nd or 3-rd limitation may you should redesign your data
scheme and/or aggregation pipeline. If you working with large time series,
you can reduce number of aggregated docs and do aggregation in several
steps (like MapReduce do). I can't say more concretely, because I don't
know your data/use cases(give me a comment).
3) You can choose different framework. If you familiar with MapReduce
concept, you can try Hadoop(it can use MongoDB as data source). I don't
have experience with MongoDB-Hadoop integration, but I mast warn you NOT to
use Mongo's MapReduce -- it sucks hard on large datasets.
4) You can do aggregation
|
Is there a workaround to allow using a regex in the Mongodb aggregation pipeline |
This question seems to come many times with no solution.
There are two possible solutions that I know:
solution 1- using mapReduce. mapReduce is the general form of aggregation
that let user do anything imaginable and programmable.
following is the mongo shell solution using mapReduce
We consider the following 'st' collection.
db.st.find()
{ "_id" : ObjectId("51d6d23b945770d6de5883f1"), "foo" : "foo1", "bar" :
"bar1" }
{ "_id" : ObjectId("51d6d249945770d6de5883f2"), "foo" : "foo2", "bar" :
"bar2" }
{ "_id" : ObjectId("51d6d25d945770d6de5883f3"), "foo" : "foo2", "bar" :
"bar22" }
{ "_id" : ObjectId("51d6d28b945770d6de5883f4"), "foo" : "foo2", "bar" :
"bar3" }
{ "_id" : ObjectId("51d6daf6945770d6de5883f5"), "foo" : "foo3", "bar" :
"bar3" }
{ "_id" : ObjectId("51d6db03945770d6de5883f6"
|
How to match by 'undefined' value in MongoDB Aggregation Framework? |
If you want to filter out documents that have some fields missing, use the
$exists operator.
This works on my machine :
> db.test.drop()
true
> db.test.insert( {'Hello':'World!', 'myField':42})
> db.test.insert( {'Hello again':'World!'})
> db.test.aggregate({'$match':{ 'myField':{'$exists':false} }})
{
"result" : [
{
"_id" : ObjectId("51b9cd2a6c6a334430ec0c98"),
"Hello again" : "World!"
}
],
"ok" : 1
}
The document that has myField present does not show in the results.
|
Mongodb's Aggregation Framework with Subset and Scala |
Subset does not provide methods to perform queries to MongoDB, its
only concern is about Mongo Java Driver’s method parameters including
documents and their fields.
So once you have built the aggregation query - you can now run it, the val
query should be a AggregationOutput result - which you can call results()
to get the actual aggregation results.
See using the aggregation framework with the java driver for more
information.
|
Array intersection in mongodb with Aggregation framework |
what about this (in the mongo shell)? Simply translate to mongoose
db.ss.aggregate([
{$unwind: '$params'},
{$match: {params: {$in: [1,20,30,4,7]} } },
{$group: {_id: {_id:"$_id", age: "$age"}, nb: {"$sum":1} } },
{$sort: {nb:-1}},
{$limit:5},
{$project: {_id:"$_id._id", age:"$_id.age", nb: "$nb"} },
{$sort:{age:1}}
])
The first stage $unwind break up the array field so that you have for each
_id, a number of documents equal to the number of elt in params, each with
a single value of the array params. $match select the document
corresponding to what we want. $group group them back using the _id and the
age as key and count the number of doc in each group; this corresponds
exactly to the number of element in the intersection. $limit take the top
five. $project and $s
|
mongodb aggregation framework groupby multiple fields |
the Map-Reduce may suit for you.
eg:
map = function (){
emit(this.topic+this.date, 1);
}
reduce = function (id, vals){
return Array.sum(vals);
}
db.coll.mapReduce(map, reduce, {out:'results'});
http://docs.mongodb.org/manual/tutorial/map-reduce-examples/
|
Use MongoDB aggregation to find set intersection of two sets within the same document |
So here is a solution not using the aggregation framework. This uses the
$where operator and javascript. This feels much more clunky to me, but it
seems to work so I wanted to put it out there if anyone else comes across
this question.
db.houses.find({'$where':
function() {
var ownSet = {};
var useSet = {};
for (var i=0;i<obj.uses.length;i++){
useSet[obj.uses[i].name] = true;
}
for (var i=0;i<obj.rooms.length;i++){
var room = obj.rooms[i];
for (var j=0;j<room.owns.length;j++){
ownSet[room.owns[j].name] = true;
}
}
for (var prop in ownSet) {
if (ownSet.hasOwnProperty(prop)) {
if (!useSet[prop]){
return tru
|
How to order MongoDB Aggregation with match, sort, and limit |
Put the $sort before the $group, otherwise MongoDB can't use the index to
help with sorting.
However, in your query it looks like you want to query for a relatively
small number of user_ids compared to the total size of your group_members
collection. So I recommend an index on user_id only. In that case MongoDB
will have to sort your results in memory by last_post_at, but this is
worthwhile in exchange for using an index for the initial lookup by
user_id.
|
Get Size of Array Intersection in MongoDB Aggregation Framework |
If I understand your question, you have data something like the following:
db.users.insert({_id: 100, likes: [
'pina coladas',
'long walks on the beach',
'getting caught in the rain'
]})
db.users.insert({_id: 101, likes: [
'cheese',
'bowling',
'pina coladas'
]})
db.users.insert({_id: 102, likes: [
'pina coladas',
'long walks on the beach'
]})
db.users.insert({_id: 103, likes: [
'getting caught in the rain',
'bowling'
]})
db.users.insert({_id: 104, likes: [
'pina coladas',
'long walks on the beach',
'getting caught in the rain'
]})
and you wish to compute for a given user how many matching features
('likes' in this example) they have with other users? The following
aggregation pipeline will accomplish this:
user = 100
user_likes = db.u
|
Use mongodb aggregation framework to group by length of array |
Ok, got it! Here we go. The aggregation pipeline is basically that:
{
$unwind: "$saved_things"
},
{
$group: {
_id: "$_id",
size: {
$sum: 1
}
}
},
{
$group: {
_id: "$size",
frequency: {
$sum: 1
}
}
},
{
$project: {
size: "$_id",
frequency: 1,
_id: 0
}
}
Unwind saved_things array, then group by document _id and count it, thus we
can achieve the array size. Now is easy, group by size and count the
frequency. Use project to rename _id field to size.
|