Mongo DB

Anup Singh
4 min readFeb 6, 2022

--

Data is stored in records called documents (Document model)

NoSQL Databases

4 Families

  • Key value (Access data using primary key, decide server based upon the key) (Redis, Dynamo, Riak)
  • Graph (Relations within table, SQL statements with self joins) (Neo4J, Infinite Graph, OrientDB, FlockDB)
  • Column oriented (Polymorphic data) (HBase, Cassandra, HBase, Hypertable) widely used to manage data warehouses, business intelligence, CRM, Library card catalogs,
  • Document oriented (Easy and natural representation like hash, maps, arrays) (Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes)

Modern Applications

  • Polymorphic data

The main difference is that each document can have a different list of fields, allowing for documents with different shapes (polymorphism) in a collection.

  • Resilient and up

Terminologies

Table => CollectionParent/Child Tables => Nested Sub-Document or ArrayIndex => IndexRow => DocumentColumn => FieldJoin => Embedding Linking / LookupView => Read only view / On Demand Materialized viewMulti Record ACID Transactions => Multi Document ACID transactions

And as for traditional relational databases, indexes in MongoDB are also the most important thing to look for to ensure excellent performance.

Document Model

Each document can have different shapes (fields)

Embedding (One to One relationship)

Array (One to many Relationship)

Model relationships with sub documents and arrays

Keep the information used together stored together

map easily to the data structure of your code

Modeling Mongo DB database

  • The document model lets, and encourages, you to keep information that is used together inside one single document.
  • Atomicity, consistency, isolation, and durability (ACID) can be implemented by leveraging the document model, which is preferred, or by using MongoDB transactions.
  • Making early decisions and compromises about very large datasets will help make the data more manageable. For example, deleting or archiving documents after a given period of time may reduce the resources needed for the project.
  • Using a distributed system, like MongoDB, means that data can be written and read on servers all over the world. Carefully planning the location of the servers will improve performance and resilience of the applications.

Methodology

  • Describe and understand the workload (Solution depends on the workload)
Understand — what operations we are modeling forquantify and qualify — read and write operationsoutputs — list of operations, which of these are most important

Model the relationship

Making the choice between embedding and linking documents

Apply schema design patterns

performance, maintenance and simplicity requirements (similar to SQL normalization)

Decision based on simplicity or performance (Should the data be embedded, aggregated, pre-computed? )

Understand the frequency of queries to make good decisions

Embedding vs Referencing

Embed -> read operations, write operations on one-one and one-many relationships, d ata that is deleted together at given time

Reference =>

  • many side is a huge number,
  • for integrity in write operations in many to many
  • when a piece if frequently used but not the other and memory is an issue

Prefer embedding vs referencing → Better performance as

Schema Validation

  • JSON schema
  • Support BSON types
  • Validation and polymorphism

Mongo DB Query Language (MQL)

  • Designed to query documents within single collection
db.<collection>.find({<field1>: <value1>, <field2>: {<operator>: <value2>})db.people.find(
{
"age": { $lt: 30 }
}
)
db.people.find().count()
db.people.find().limit(1)
db.people.find().sort({ "age": 1 })

Mongo Aggregation Framework

  • Breaks complex operations into stages
  • Operators are called within stages
  • Imperative Language
  • Uses Sequential Stages in contrast to declarative SQL which uses nested statements
db.cars.aggregate(
{
$match: {"make": "Fiat"}, // filter
},
{
$project: { // projection
"_id": 1,
"owner": 1,
"make": 1,
"Wheels": 1,
}
}
)

SQL Terms, Functions, and Concepts vs MongoDB Aggregation FrameworkStages, Operators, and Concepts

  • SQL — — — — MongoDB
  • SELECT — — — — db.aggregate()
  • WHERE — — — — $match
  • GROUP — — — — $group
  • COUNT — — — — $count
  • LIMIT — — — — $limit
  • ORDER BY — — — — $sort
  • JOIN — — — — $lookup
  • UNION ALL — — — — $unionWith

ACID with Documents and Transactions

$lookup operator

BSON -> Binary version of JSON

“int”, “long”, “float” and “decimal” for numerical values

Distributed Database Considerations

  • Replica set
  • Sharded cluster

CAP Theorem

write concerns (durability guarantee for write operation)

  • Acknowledgement
  • write concern of majotiry
  • always writes to primary

read concerns (guarantee that read operation will get durable data)

  • read concern local

read preference (preferred node to read from)

  • nearest

global cluster

  • zone sharding

Similarity

ACID (Atomicity, Consistency, Isolation, and Durability)

High Availability (support applications through a group of servers to minimize down time)

Replication (Keeping many copies of each data piece on different servers by replicating the data to achieve high availability and fault tolerance)

Pros

Scalability

NoSQL database is non-relational, so it scales out better than relational databases as they are designed with web applications in mind.

Relational DB

Bloated data => Normalize

Data for a single thing spreads out to dozens of table

  • Hard to understand
  • Inefficient to pull data from multiple tables

MongoDB Atlas

  • Mongo DB Admin Course
  • Free-tier
  • 512 MB storage

--

--

Anup Singh
Anup Singh

Written by Anup Singh

A highly enthusiastic computer engineer...

No responses yet