One Year of MongoDB

Soon after ShareLaTeX joined Overleaf, I got to work with their MongoDB database. Coming with a relational databases experience (MySQL then PostgreSQL), I had a lot to learn!

Here are some things I had gathered after a year, extracted from a talk I gave on 2018-09-19

Basics #

Shell #

One connects to MongoDB using the shell:"

$ mongo mongodb://path/to/db
> db.users.find(query, projection)
> db.users.update(query, update, options)

Shell History

The shell history is stored in cat ~/.dbshell.

Querying #

The query is a JS object:

check equality with { attribute: value }
advanced check with { attribute: Object } (Object is a JS object with depth of one or more)
or sometimes { Operator: { attribute: value } }

Operators

$ne: { attribute: { $ne: value } }
$gt: { attribute: { $gt: value } }
$in: { attribute: { $in: [value1, value2] } }
$exists: { attribute: { $exists: true } }
$or: { $or: [{ attribute: value1 }, { attribute: value2 }] }

Pretty output #

findOne gives you a prettier output by default, but find() doesn’t. find().pretty() will help with that.

Pitfalls #

Update/Delete records #

Update Multiple records

By default, update will update a single document.

db.test.update(query, update, { multi: true })
- Default: { multi: false }
db.test.updateMany(query, update)

Delete Multiple records

By default, remove will remove all documents.

db.collection.remove(query, options)
- Default: { justOne: false }
db.collection.deleteMany(query, options)

Delete a single record

db.collection.remove(query, { justOne: true })
db.collection.delete(query, options)

Array Size Range #

Size takes a number:

> db.users.findOne({ emails: { $size: 4 }})
// OK

But not a range:

> db.users.findOne({ emails: { $size: { $gte: 4 } }})
Error: error: {
	"ok" : 0,
	"errmsg" : "$size needs a number",
	"code" : 2,
	"codeName" : "BadValue"
}

Workaround

> db.users.findOne({ 'emails.4': { $exists: true }})
// OK

Indexes #

Nulls #

By default, index are indexing documents even if the attribute is null or nonexistent. Problems:

this is a waste of space and memory
this doesn’t work with uniqueness

> db.tests.createIndex(
	{ one: 1 },
	{
		name: 'tests_index',
		unique: true
	}
)
> db.tests.insert({one: 1})
> db.tests.insert({one: 1})
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 11000,
		"errmsg" : "E11000 duplicate key error collection: db.tests index: tests_index dup key: { : 1.0 }"
	}
})
> db.tests.insert({two: 2})
> db.tests.insert({two: 2})
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 11000,
		"errmsg" : "E11000 duplicate key error collection: db.tests index: tests_index dup key: { : null }"
	}
})

The solution is to use partial or sparse indexes.

Partial Indexes #

> db.tests.createIndex(
	{ one: 1 },
	{
		name: 'tests_index_partial',
		unique: true,
		partialFilterExpression: {
			'one': { $exists: true }
		}
	}
)
> db.tests.insert({one: 1})
> db.tests.insert({one: 1})
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 11000,
		"errmsg" : "E11000 duplicate key error collection: db.tests index: tests_index_partial dup key: { : 1.0 }"
	}
})
> db.tests.insert({two: 2})
> db.tests.insert({two: 2})
// OK

⚠️ To use the partial index, a query must contain the filter expression (or a modified filter expression that specifies a subset of the filter expression) as part of its query condition.

Sparse Indexes #

A subset of partial indexes

> db.tests.createIndex(
	{ one: 1 },
	{
		name: 'tests_index_sparse',
		unique: true,
		sparse: true
	}
)
> db.tests.insert({one: 1})
> db.tests.insert({one: 1})
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 11000,
		"errmsg" : "E11000 duplicate key error collection: db.tests index: tests_index_sparse dup key: { : 1.0 }"
	}
})
> db.tests.insert({two: 2})
> db.tests.insert({two: 2})
// OK

Explain #

`.explain()` #

> db.tests.find({one: 1}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "db.tests",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"one" : {
				"$eq" : 1
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"one" : 1
				},
				"indexName" : "tests_index_sparse",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"one" : [ ]
				},
				"isUnique" : true,
				"isSparse" : true,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"one" : [
						"[1.0, 1.0]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		// ...
	},
	"ok" : 1
}

`.explain("executionStats")` #

> db.tests.find({one: 1}).explain("executionStats")
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "db.tests",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"one" : {
				"$eq" : 1
			}
		},
		"winningPlan" : {
			// ...
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 1,
		"totalDocsExamined" : 1,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 1,
			"executionTimeMillisEstimate" : 0,
			"works" : 2,
			"advanced" : 1,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 1,
				"executionTimeMillisEstimate" : 0,
				"works" : 2,
				"advanced" : 1,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"one" : 1
				},
				"indexName" : "tests_index_sparse",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"one" : [ ]
				},
				"isUnique" : true,
				"isSparse" : true,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"one" : [
						"[1.0, 1.0]"
					]
				},
				"keysExamined" : 1,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
			// ...
	},
	"ok" : 1
}

Collations #

For case-insensitive indexes or queries.

Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.

db.users.createIndex(
	{ email: 1 },
	{
		name: "email_case_insensitive",
		collation: {
			locale: "en",
			strength: 2
		},
		unique: true,
		background: true
	}
)

Strength levels: #

Value	Description
1	Primary level of comparison. Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.
2	Secondary level of comparison. Collation performs comparisons up to secondary differences, such as diacritics. That is, collation performs comparisons of base characters (primary differences) and diacritics (secondary differences). Differences between base characters takes precedence over secondary differences.
3	Tertiary level of comparison. Collation performs comparisons up to tertiary differences, such as case and letter variants. That is, collation performs comparisons of base characters (primary differences), diacritics (secondary differences), and case and variants (tertiary differences). Differences between base characters takes precedence over secondary differences, which takes precedence over tertiary differences. This is the default level.
4	Quaternary Level. Limited for specific use case to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text.
5	Identical Level. Limited for specific use case of tie breaker.

⚠️ You must specify the same collation at the query level in order to use the index-level collation.

💡 You can use one index with collation for uniqueness constraint and another for querying.

Aggregates #

Only a Postgres developer would try that!
— John

Definition #

 db.collection.aggregate(pipeline, options)

pipeline is an Array of ‘stages’: [ { <stage> }, ... ]
stage is an Object with one key: the stage name. Its value describes the stage.

Stages #

{ $addFields: { <newField>: <expression>, ... } } Pass a document to the next stage after adding some attributes
{ $project: { <specification(s)> } } Similar to $addFields but only pass the specified attributes to the next stage
{ $match: { <query> } } Filter the documents. Pass matching documents to the next stage
{ $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> } } Joins!

Tip: To complex queries within the Mongo shell, on can define functions, editable with the default text editor:

edit usersAggregate
usersAggregate()

Aggregate Example #

function userAggregate() {
  return db.users.aggregate([
    { // STAGE 1
      $addFields:
      {
        lowercaseEmail: { $toLower: "$email" },
        isLowercase: { $eq: ["$email", { $toLower: "$email" }] }
      }
    },
    { // STAGE 2
      $match:
      {
        isLowercase: false
      }
    },
    { // STAGE 3
      $lookup:
      {
        from: "users",
        localField: "lowercaseEmail",
        foreignField: "email",
        as: 'usersLookup'
      }
    },
    { // STAGE 4
      $addFields:
      {
        isDuplicate: { $gt: [{ $size: "$usersLookup" }, 0] }
      }
    },
    { // STAGE 5
      $project:
      {
        "_id": 1,
        email: 1,
        newEmail: {
          $cond: { if: "$isDuplicate", then: { $concat: ['downcased___', "$lowercaseEmail"] }, else: "$lowercaseEmail" }
        }
      }
    }
  ])
}

# Mongodb # Sql

Tim Alby

I'm a software engineer working freelance as a full-stack web developer. My most common languages are PostgreSQL, Node, Rails and React.