MongoDB Basics

From trapsink.com
Jump to: navigation, search


Introduction

MongoDB is a document-oriented database model; it uses JSON constructs to not only store the data but also to interact with the system itself. Many commands may look a little odd coming from a MySQL background but in general the concept of what you're trying to do is somewhat the same. The 10gen website has a great page detailing how to apply your MySQL knowledge to the MongoDB world:

SQL Terms/Concepts MongDB Terms/Concepts
database database
table collection
row document or BSON document
column field
index index
table joins embedded documents and linking
primary key primary key
aggregation (e.g. group by) aggregation pipeline ( SQL to Aggregation Mapping Chart )

A few select examples from the linked website:

SQL Select Statements MongoDB find() Statements
SELECT * FROM users db.users.find()
SELECT id, user_id, status FROM users db.users.find( { }, { user_id : 1 , status : 1 } )
SELECT * FROM users WHERE status = "A" ORDER BY user_id DESC db.users.find( { status : "A" } ).sort( { user_id : - 1 } )

Using the .explain() method in MongoDB runs the query which is exactly the opposite of MySQL. Be very careful you are not using .explain() on a database with any sort of data altering command (think UPDATE / INSERT / DELETE in MySQL)


Fundamentals

Installation and Updates

RHEL / CentOS

Utilize the standard Yum repository style configuration:

# vi /etc/yum.repos.d/10gen.repo
[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1
# yum install mongo-10gen mongo-10gen-server


Ubuntu / Debian

Utilize the standard APT sources style configuration:

# apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
# echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' >> /etc/apt/sources.list.d/10gen.list
# apt-get update
# apt-get install mongodb-10gen


User Management

MongoDB uses role-based access control based on database level; the system.users collection contains the data which correlates roughly to the mysql.user table in MySQL, however it is not manipulated quite the same way. The 10gen website has great introductory material on Access Control:

Authentication is disabled by default in an out of the box installation! Refer to the above documentation and tutorials for basic user administration tasks should they be required or repaired as most production level configurations will have had security practices applied.


Network Connectivity

MongoDB Default Ports

27017

  • default port for mongod and mongos instances
  • change with port with --port / port
  • bind with --bind_ip / bind_ip
  • define Replicat set with --replSet / replSet
  • set DB datadir with --dbpath / dbpath

27018

  • default port when running with --shardsvr / shardsvr

27019

  • default port when running with --configsvr / configsvr

28017

  • default port for the web status page
  • always accessible at a port + 1000
  • disable with --nohttppinterface / nohttpinterface
  • no authentication by default
  • enable REST interface with --rest / rest


System Level Software Configuration

Vendor packages place the default configurations, service scripts and data directories in the standard location methodologies. Subtle differences exist between the platforms:

Red Hat / CentOS

  • /etc/mongod.conf
  • /etc/sysconfig/mongod
  • /etc/rc.d/init.d/mongod
  • /var/log/mongo/mongod.log
  • /var/lib/mongo/
  • ~/.mongorc.js

Ubuntu / Debian

  • /etc/mongodb.conf
  • /etc/init/mongodb.conf
  • /etc/init.d/mongodb
  • /var/log/mongodb/mongodb.log
  • /var/lib/mongodb/
  • ~/.mongorc.js


Intermediate Troubleshooting

Kill errant MongoDB Thread

Killing an errant thread in MongoDB is directly analogous to killing one in MySQL - you examine the stack, find the one in question and issue a command to terminate it.

Do not kill threads which are compacting databases or any background threads which are indexing data - this can lead to database corruption

First, use the db.currentOp() mongo shell command to list your threads; this is analogous to show full processlist in MySQL.

$ mongo
MongoDB shell version: 2.4.5
connecting to: test
> db.currentOp()
{
    "inprog" : [
        {
            "opid" : 2506233,
            "active" : true,
            "secs_running" : 140,
            "op" : "update",
            "ns" : "generators.sensor_readings",
            "query" : {
                "$where" : "function(){sleep(500);return false;}"
            },
            "client" : "127.0.0.1:51773",
            "desc" : "conn20",
            "threadId" : "0x7f694753d700",
            "connectionId" : 20,
            "locks" : {
                "^" : "w",
                "^generator" : "W"
            },
            "waitingForLock" : false,
            "numYields" : 279,
            "lockStats" : {
                "timeLockedMicros" : {
                    "r" : NumberLong(0),
                    "w" : NumberLong(280242564)
                },
                "timeAcquiringMicros" : {
                    "r" : NumberLong(0),
                    "w" : NumberLong(140420592)
                }
            }
        },
        {
            "opid" : 2507691,
            "active" : false,
            "op" : "query",
            "ns" : "",
            "query" : {
                 
            },
            "client" : "127.0.0.1:51772",
            "desc" : "conn19",
            "threadId" : "0x7f6962e4a700",
            "connectionId" : 19,
            "locks" : {
                "^generator" : "R"
            },
            "waitingForLock" : true,
            "numYields" : 0,
            "lockStats" : {
                "timeLockedMicros" : {
                     
                },
                "timeAcquiringMicros" : {
                     
                }
            }
        }
    ]
}


In the example above we see two threads; the keys to look for are the waitingForLock, secs_running, and op fields of the command. The threads we're looking for is the first one with opid 2506233 as it's the one locking up our database; but notice it has W in the locks subdocument. We kill it with the db.killOp() command only if we're sure the data it's writing can be lost – this is a dangerous operation to perform and should be examined carefully. Read operations are generally safe to kill in an emergency.

> db.killOp(2506233);
{ "info" : "attempting to kill op" }
> db.currentOp()
{ "inprog" : [ ] }



Check Replica Status

Somewhat similar to MySQL, replication is based on two configurations working together; the core mongod process must be started with a config file/command line flag to tell it which replica set it lives. This is the replSet keyword and can be any string, so long as all instances (processes) share the same name. For example, here are three processes started on the same server for testing a replica set:

# mongod --dbpath 1 --port 27001 --smallfiles --oplogSize 50 --logpath 1.log --logappend --fork --replSet w4
# mongod --dbpath 2 --port 27002 --smallfiles --oplogSize 50 --logpath 2.log --logappend --fork --replSet w4
# mongod --dbpath 3 --port 27003 --smallfiles --oplogSize 50 --logpath 3.log --logappend --fork --replSet w4


Once the Replica set is initialized and configured (using rs.initiate() and rs.add() / rs.reconfig() commands), checking the status is done from any member of the set using the rs.status() command:

$ mongo --port 27002
MongoDB shell version: 2.4.5
connecting to: 127.0.0.1:27002/test
w4:PRIMARY> rs.status()
{
        "set" : "w4",
        "date" : ISODate("2013-08-19T18:53:23Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 1,
                        "name" : "mongo1c:27002",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 586,
                        "optime" : Timestamp(1376937880, 1),
                        "optimeDate" : ISODate("2013-08-19T18:44:40Z"),
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "mongo1c:27003",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 584,
                        "optime" : Timestamp(1376937880, 1),
                        "optimeDate" : ISODate("2013-08-19T18:44:40Z"),
                        "lastHeartbeat" : ISODate("2013-08-19T18:53:21Z"),
                        "lastHeartbeatRecv" : ISODate("2013-08-19T18:53:21Z"),
                        "pingMs" : 0,
                        "syncingTo" : "mongo1c:27002"
                },
                {
                        "_id" : 3,
                        "name" : "mongo1c:27001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 523,
                        "optime" : Timestamp(1376937880, 1),
                        "optimeDate" : ISODate("2013-08-19T18:44:40Z"),
                        "lastHeartbeat" : ISODate("2013-08-19T18:53:23Z"),
                        "lastHeartbeatRecv" : ISODate("2013-08-19T18:53:21Z"),
                        "pingMs" : 0,
                        "syncingTo" : "mongo1c:27002"
                }
        ],
        "ok" : 1
}


Notice how the stateStr field will help identify who is the PRIMARY (writer) of the set; unlike MySQL the PRIMARY node can be moved around on the fly - whether it's automatic by voting, or manual actions performed (such as taking a node offline for maintenance work). Actions such as rs.freeze(), rs.stepDown() and rs.remove() exist to manipulate the Replica set. Note that you can always query the instance you logged into with the db.isMaster() command to get another view of who is the PRIMARY writer.

w4:PRIMARY> db.isMaster()
{
        "setName" : "w4",
        "ismaster" : true,
        "secondary" : false,
        "hosts" : [
                "mongo1c:27002",
                "mongo1c:27001",
                "mongo1c:27003"
        ],
        "primary" : "mongo1c:27002",
        "me" : "mongo1c:27002",
        "maxBsonObjectSize" : 16777216,
        "maxMessageSizeBytes" : 48000000,
        "localTime" : ISODate("2013-08-19T18:58:40.488Z"),
        "ok" : 1
}


Check Sharding Status

Connecting to the shard server (mongos) to view the configuration:

mongo localhost:27108/admin -u admin -p
mongos> sh.status()
    --- Sharding Status ---
    sharding version: { "_id" : 1, "version" : 3 }
shards:
{  "_id" : "db1",  "host" : "db1:27001,db2:27001,db3:27001" }
{  "_id" : "db2",  "host" : "db3:27002,db1:27002,db2:27002" }
{  "_id" : "db3",  "host" : "db2:27003,db3:27003,db1:27003" }
databases:
{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
{  "_id" : "generators",  "partitioned" : true,  "primary" : "db1" }
 
generators.sensor_readings chunks:
 
    db3    3
    db2    6
    db1    6



References