Data Modeling

What is data modeling?

Data modeling is when we try to describe something in the real world (IE, a user, or a car) with a structure of information.

Modeling Real Things

Kinds of data

Strings (characters within quotes, "Hi my name is Liz.")
Numbers (numeral characters - 1, 3, 27, 49)
Boolean (true or false)
Arrays(Lists of anything ["Hello", 27, "Arugula"])
Objects (key-value pairs)

An object looks like this:

{
    "name" : "value", 
    "property" : "value",
    "age" : "35",
    "weight" : "180",
    "favorite_foods" : ["Artichoke", "Alphalpha", "Anchovies"],
    "favorite_books" : [{"title": "Moby Dick", "author" : "Hermann Melville"}, {"title" : "Where the Wild Things Are", "author" : "Maurice Sendak"}]
}

                    In JavaScript, we have data types, accessed by typeof. These tell us what kinds of data we're storing - strings, numbers, objects, arrays, and so on.

Data Comes in Hierarchies

Often, data is "nested" in "hierarchies" of objects within objects.

Methods of storing

Plain Text (.txt)
CSV (.csv)
JSON (JavaScript Object Notation)
Relational Database (SQL, Oracle)
Non-Relational Database (MongoDB, Cassandra, etc)

                    These days, there are many different methods of storing things.

Structure

car = {
    "name": "Herby",
    "make": "Volkswagen",
    "model": "Bug",
    "purpose": "Love",
    "engineType": "Back",
    "color": "Stripes",
    "year": "1970"
}

Might not be as helpful as...

car = {
    "name": "Herby",
    "make": "Volkswagen",
    "model": "Bug",
    "purposes": ["Love", "Driving around", "Saving people?"],
    "engine": {"location": "Back", "cylinders": 4, "fuel-injected": false, "loud": true},
    "description": {"paint_profile": "Stripes", "colors": ["black", "white", "silver"], "attitude": "sassy"},
    "year": "1970"
}

This matters a lot!

Data can limit your capabilities, or expand them.

Grouping similar pieces of data together helps you stay organized, and helps the computer use it faster and easier. It also helps engineers program things more efficiently.

if (car.engine.loud == true && car.description.attitude == "sassy") {
    console.log("I think Herby the love bug is comin' down the road!");
}

How are things the same?

Let's try to model these books together.

What unique properties do they have? What properties do they share?

How are things the same?

What does it mean to be a book?

{
title: "",
author: "",
length: "",
ISBN: "",
cover: "",
language: "",
customer_rating: "",
tags: "",
amazon_link: "http://www.amazon.com/The-Power-Habit-What-Business/dp/1400069289/ref=sr_1_1?ie=UTF8&qid=1355257104&sr=8-1&keywords=power+of+habit"
}

{
title: "",
author: "",
length: "",
ISBN: "",
cover: "",
language: "",
customer_rating: "",
tags: "",
amazon_link: "http://www.amazon.com/Hyperspace-Scientific-Parallel-Universes-Dimension/dp/0195085140/ref=tmm_hrd_title_0?ie=UTF8&qid=1355257238&sr=1-1",
amazon_link2:"http://www.amazon.com/Hyperspace-Scientific-Odyssey-Parallel-Universes/dp/0385477058/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&colid=N55WK4E5RGPM&coliid=I39ORQQR1YUM18"
}

Baselines

Usually we try to go from generality - what does everything share in common, to specific - what makes everything unique?

Asking "what does something have to have at minimum to be this type of thing?" is a good way to find out a lot about your models.

Schema.org did a lot of this - check them out for some examples.

Exercise Time!

Relationships

                    We have types of databases that enforce relationships - Relational Database Management Systems.

Non-relational Systems

                   We also have types of databases that don't enforce relationships -
                   Non-Relational Database Systems.

Modeling Relationships

Or... How does a non-relational DB work?

                    Most of the time, pieces of data have relationships to one another.
                    It's important to express these properly. Don't constrain them too hard, or make them too loose.

Types of Relationships

One-to-One
One-to-Many
Many-to-One
Many-to-Many

Together!

Let's try to model recipes together.

How are Foods and Recipes related?
Foods can appear in multiple recipes, so it makes sense not to duplicate foods.

It's mostly decision-making

Questions to ask:

What does it mean to be an object? (duuuuuude.)
If I split these objects up, what does that mean for related data?
If I create a relationship, what does that mean for related data?
If I create a hierarchy, what does that mean for sub-objects? Related data?
Will this allow me to do more things later, or restrict what I can do later?
Is it worth the time right now to have an existential crisis about this?! (Usually not.)

Exercise Time!

Don't be an "Architecture Astronaut"

"It is better to have a codebase you're moderately ashamed of that's full of hacks than nothing at all."
- Ancient Native American Proverb

Resources

Schema.org - A bunch of geeks (one of whom I live with) decided to write down once and for all what you need at minimum to be anything.
A super technical essay at agiledata.org
A whole amazing course on Coursera.org that will teach you this in far more detail than I ever could. (I signed up for this course, if you'd like to study for it with me!)