Picture it: you've just tumbled through a hole in Mt. Ebott into a bed of golden flowers. Before there's a town, a shop, or a single monster to talk to, there has to be an Underground for them to live in. The database IS the mountain. Tables get built inside it later. This chapter is short on purpose: you only do this part once.
One command brings the whole world into existence. You'll run this exactly once, ever, for this save file. When you start a database, always remember that Create database
Create database MidGame, it will be called that.
CREATE DATABASE Underground;
Every SQL statement ends with a semicolon ;, it's the little sparkle that tells the game your turn is over. SQL won't move on until it sees one.
Creating the mountain doesn't put you inside it. It's like the game booting to a title screen, you still have to press start. That's what USE does:
USE Underground;
Run this right after creating the database, and again every time you reopen your save. It tells SQL which world all your future commands happen in.
Forget USE and every command after it fails with a confusing error. SQL is standing at the surface going "which fallen human, exactly?"
A table is like a party roster, it has named columns and rows of data. You design the columns once upfront, and monsters get added later. Let's build our monsters table from scratch, one piece at a time.
Every table starts with CREATE TABLE. Those two words never change. What changes is the name right after, we're calling ours monsters, but party or residents would work too. No spaces in the name, use underscores. Everything that defines the table lives inside the parentheses:
CREATE TABLE monsters (
);
The first thing we add is an id column. Every table needs one, think of it as a SOUL, unique to that row no matter how similar two monsters look. Three keywords work together:
CREATE TABLE monsters ( id INT PRIMARY KEY AUTO_INCREMENT, );
A table can only have ONE PRIMARY KEY. Ever. Put it on two columns and SQL refuses to spawn the table.
Now a column for the monster's name. Names are text, so we use VARCHAR, it always needs a number in parentheses, the max character count. VARCHAR(100) means up to 100 characters. We add NOT NULL too, a nameless monster is bad for morale, so this rule is enforced automatically.
CREATE TABLE monsters ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100) NOT NULL, );
Phone number and location are both text too, so both reuse the VARCHAR pattern with different lengths. Neither gets NOT NULL, because both are optional, some monsters just don't own a phone. With no NOT NULL and no value given, SQL stores NULL: the total absence of a value.
CREATE TABLE monsters ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100) NOT NULL, phone_number VARCHAR(20), location VARCHAR(50) );
Notice the last column has no trailing comma. Every column except the last needs one, miss it and SQL throws a syntax error.
| Type | What it stores | Example |
|---|---|---|
| INT | Whole numbers only. | 1, 42, 99 |
| VARCHAR(n) | Text up to n characters. | name, location |
| TEXT | Long text, no length limit. | battle quotes, flavor text |
| DECIMAL(8,2) | Numbers with decimal places. | 12.50, 4.00 |
| BOOLEAN | True or false only. | is_hostile |
| DATE | A calendar date. | '2015-09-15' |
| DATETIME | A date combined with a time. | '2015-09-15 09:30' |
Always use DECIMAL(8,2) for gold prices, never INT, which would silently drop the spare change.
Each shop purchase needs to know which monster made it, so we store that monster's ID inside the purchase. REFERENCES is what's new here. You can see we're using monsters(id). The (id) inside the parentheses is the primary key of the monsters table we just made (if you don't know already lol). That stored ID is called a foreign key. A foreign key is a column in one table that points to the primary key of another table. This is what connects the two tables together.
CREATE TABLE shop_purchases ( id INT PRIMARY KEY AUTO_INCREMENT, monster_id INT REFERENCES monsters(id), item_name VARCHAR(100) NOT NULL, price_gold DECIMAL(8,2), purchase_date DATE );
You can't delete the monsters table while shop_purchases still points at it. Always remove the child table before the parent.
First, write INSERT INTO. This tells SQL you want to add a new row.
Next, write the name of the table you want to add data to. In our case, the table is called monsters.
After the table name, put the column names inside parentheses. These are the names you gave your columns when you created the table (name, phone_number, and location). Think of it as telling SQL, "These are the columns I want to fill."
So it should look like this:
INSERT INTO monsters(name, phone_number, location)
Next comes VALUES. Inside its parentheses, write the data you want to add in the same order as the columns above.
VALUES ('Mike', '555-1234', 'Dark Forest');
INSERT INTO monsters (name, phone_number, location) VALUES ('Sans', '555-0100', 'Snowdin'); INSERT INTO monsters (name, phone_number, location) VALUES ('Undyne', '555-0155', 'Waterfall');
| id | name | phone_number | location |
|---|---|---|---|
| 1 | Sans | 555-0100 | Snowdin |
| 2 | Undyne | 555-0155 | Waterfall |
The number of columns listed must exactly match the number of values. Any column marked NOT NULL must be included, or the insert fails.
Here's the cast we'll keep pulling from for the rest of the guide. No portraits, just the CHECK screen, the way the game itself would show you.
Don't just read the CHECK cards above, build them. Make a monster_stats table and insert each cast member's ATK/DEF from the cards. Match the monster_id to the order above (Sans = 1, Papyrus = 2, Undyne = 3, Alphys = 4, Toriel = 5, Asgore = 6).
CREATE TABLE monster_stats ( id INT PRIMARY KEY AUTO_INCREMENT, monster_id INT REFERENCES monsters(id), atk INT, def INT ); INSERT INTO monster_stats (monster_id, atk, def) VALUES (1, 1, 1), -- Sans (2, 3, 3), -- Papyrus (3, 15, 7), -- Undyne (4, 4, 10), -- Alphys (5, 6, 6), -- Toriel (6, 8, 8); -- Asgore
Once that table exists, every CHECK card above stops being flavor text, it's a row you can SELECT, WHERE, and JOIN against for the rest of this guide. That's exactly the muscle you're building.
You've got monsters sitting in tables now. Time to actually check them, this is the thing you'll do 10,000 times as a DBA, so don't rush it.
SELECT means "show me." FROM means "from where." They're always together. After SELECT, list the columns you want, separated by commas, in whatever order you want them shown:
SELECT name, location FROM monsters; -- two columns, in this exact order, from the monsters table
Want every column instead of typing them all out? The asterisk * means "give me all of them":
SELECT * FROM monsters;
SELECT * is great for exploring a table you don't know well. In real scripts, name your columns, if someone adds a column later, * silently grabs that too.
AS renames a column, but only in what you see on screen. The table itself never changes. Think of it as just temporary.
The table under me is using a table we made, but we never added anything to it. Before you go into it, use what skills you learned to add information before going foward
SELECT price_gold AS cost, name AS buyer FROM shop_purchases;
Want to know which items exist at all, not how many times each sold? DISTINCT collapses everything down to one copy of each unique value:
SELECT DISTINCT item_name FROM shop_purchases; -- Nice Cream (once), Spider Cider (once), Bisicle (once)...
You cannot use DISTINCT and GROUP BY in the same query. Pick one, DISTINCT for unique values, GROUP BY for math per category (Chapter 6).
SELECT shows you EVERYTHING. WHERE says "yeah, but only the rows that match THIS." It's the most important keyword after SELECT itself.
WHERE goes right after FROM, always. The condition you write gets checked against every row, true, it survives; false, it flees the battle.
SELECT * FROM shop_purchases WHERE item_name = 'Nice Cream'; -- only rows where item_name is exactly 'Nice Cream' survive
WHERE cannot use COUNT, SUM, AVG, or any math function on a group of rows. That's exactly what HAVING is for (Chapter 6).
WHERE price_gold = 10 -- exactly 10 WHERE price_gold != 10 -- anything but 10 WHERE price_gold > 10 -- more than 10 WHERE price_gold < 10 -- less than 10 WHERE price_gold >= 10 -- 10 or more WHERE price_gold <= 10 -- 10 or less
These work on dates too, WHERE purchase_date > '2015-09-01' is completely valid. SQL knows how to compare dates as "earlier" or "later," not just numbers.
-- Only Nice Cream purchases over 10 gold: SELECT * FROM shop_purchases WHERE item_name = 'Nice Cream' AND price_gold > 10;
Mixing AND and OR? Use parentheses. SQL silently reads AND before OR, without them you get a query that runs fine and quietly returns the wrong monsters.
-- instead of: WHERE location = 'Snowdin' OR location = 'Waterfall' OR location = 'Hotland' -- just write: WHERE location IN ('Snowdin', 'Waterfall', 'Hotland')
Flip it with NOT IN to exclude a list instead: WHERE location NOT IN ('Snowdin', 'Waterfall') returns every monster outside those two areas.
WHERE price_gold BETWEEN 8 AND 60 -- same as: WHERE price_gold >= 8 AND price_gold <= 60 -- exactly 8 or exactly 60 both count
% means "anything at all here, including nothing." _ means "exactly one mystery character."
WHERE name LIKE 'S%' -- starts with S WHERE name LIKE '%e' -- ends with e WHERE name LIKE '%a%' -- 'a' appears anywhere in the name WHERE name LIKE 'S_ns' -- S, then one character, then ns, matches 'Sans'
LIKE is slow on big tables, it checks every row, character by character. Fine for 500 rows. A genuine nightmare for 50 million.
NULL means "no value was ever recorded here." Not zero, not an empty string, the total absence of data. You cannot check for it with an equals sign.
WHERE phone_number IS NULL -- no phone on file WHERE phone_number IS NOT NULL -- a number exists
WHERE phone_number = NULL will NEVER match anything, ever. No error, it quietly returns 0 rows. NULL means "unknown," and SQL refuses to call an unknown equal to anything. Always use IS NULL / IS NOT NULL.
Now that you can pull exactly the rows you want, let's control the order they come back in, and stop the database from dumping ten thousand items on you at once.
SELECT * FROM shop_purchases ORDER BY price_gold DESC; -- priciest purchase shows up first
ORDER BY location ASC, name ASC -- sort by location first, then by name within each location
ORDER BY never uses AND. ORDER BY location AND name looks reasonable but is wrong, use a comma.
SELECT * FROM shop_purchases ORDER BY price_gold DESC LIMIT 10; -- top 10 most expensive purchases, nothing else
LIMIT is the MySQL/PostgreSQL keyword. SQL Server uses TOP, Oracle uses FETCH FIRST. Same move, three different names.
SQL can crunch numbers for you instead of you exporting everything to a spreadsheet first. Aggregate functions take a whole bunch of rows and squish them down into a single stat.
SELECT COUNT(*) AS total_purchases, -- every row, including NULLs COUNT(discount) AS discounted, -- skips NULL values SUM(price_gold) AS total_gold, -- add them all up AVG(price_gold) AS avg_price, -- the average MIN(price_gold) AS cheapest, -- smallest value MAX(price_gold) AS priciest -- biggest value FROM shop_purchases;
No WHERE clause here, so it's crunching every row in the whole table into one line of results, one total count, one grand total, one average. That's an aggregate's default move: collapse everything to one number, unless told otherwise (which is exactly what GROUP BY does next).
COUNT(*) counts every row. COUNT(discount) skips NULLs. Out of 100 purchases where 60 got a Tem discount: COUNT(*) = 100, COUNT(discount) = 60.
SELECT item_name, COUNT(*) AS times_bought FROM shop_purchases GROUP BY item_name; -- one row per distinct item, with how many times it sold
Every column in SELECT that is NOT wrapped in an aggregate function must also appear in GROUP BY.
WHERE filters individual rows before grouping. HAVING filters entire groups after grouping.
SELECT item_name, COUNT(*) AS times_bought FROM shop_purchases GROUP BY item_name HAVING COUNT(*) > 5; -- only items bought more than 5 times total
HAVING without GROUP BY makes no sense, there are no groups to filter. Not grouping anything? You want WHERE instead.
Before you JOIN tables, you need to understand HOW they're related. Three types exist, and getting this right saves you a world of pain later.
One row in Table A links to many rows in Table B. A single monster can make many purchases, but each purchase belongs to exactly one monster. The rule for the foreign key: the "many" side holds it.
One monster can belong to many friend groups, and one friend group can have many monsters. Neither side is the clear "one." The fix: a bridge table storing pairs of ids.
CREATE TABLE monster_groups ( monster_id INT REFERENCES monsters(id), group_id INT REFERENCES groups(id), PRIMARY KEY (monster_id, group_id) ); -- a composite primary key: the COMBINATION must be unique
Each monster has exactly one stat sheet, and each stat sheet belongs to exactly one monster. The only difference from one-to-many: add UNIQUE to the foreign key.
CREATE TABLE monster_stats ( id INT PRIMARY KEY AUTO_INCREMENT, monster_id INT UNIQUE REFERENCES monsters(id), hp INT DEFAULT 20 );
| Relationship | Example | Where's the FK? | Special constraint |
|---|---|---|---|
| One-to-Many | Monster โ Purchases | "Many" side (purchases) | None, FKs can repeat |
| Many-to-Many | Monsters โ Groups | Bridge table | Composite PRIMARY KEY |
| One-to-One | Monster โ Stat Sheet | "Child" side | UNIQUE on the FK |
Before you combine two tables, there's one small idea that has to click first. Let's slow all the way down and build it from zero, nothing here depends on anything later in the guide.
Here are our two tables again, written out in full. Look closely, both have a column called id:
CREATE TABLE monsters ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100) NOT NULL, location VARCHAR(50) ); CREATE TABLE shop_purchases ( id INT PRIMARY KEY AUTO_INCREMENT, monster_id INT REFERENCES monsters(id), item_name VARCHAR(100) NOT NULL, price_gold DECIMAL(8,2) );
monsters has its own id, one per monster. shop_purchases has its own id too, one per purchase. Two completely different columns that just happen to share a name. Now say this query out loud:
SELECT id FROM monsters, shop_purchases;
Which id does that mean, the monster's, or the purchase's? SQL can't guess, and it won't try. It needs to be told exactly which table you're pointing at.
Run a query with an unclear column name like the one above and SQL stops you cold: Error: column 'id' is ambiguous. This isn't a rare edge case, it's the very first error most people hit the moment a second table shows up.
The fix is a small naming pattern called dot notation. Write the table name, then a period, then the column name, no spaces:
monsters.id shop_purchases.id
Read it left to right like a mailing address: the table name is the building, the dot just means "inside of," and the column name is the door number. Here's the Step 1 query again, this time fully labeled and safe to run:
SELECT monsters.id, monsters.name FROM monsters;
And here's dot notation doing real work, a query that reaches into both tables at once, matching each purchase back to the monster who made it:
SELECT monsters.name, shop_purchases.item_name, shop_purchases.price_gold FROM monsters, shop_purchases WHERE monsters.id = shop_purchases.monster_id;
Every column is labeled with the table it lives in, so there's zero confusion about which id, which name, or which price_gold is meant, even with two tables sitting in the same query.
You only need dot notation once two or more tables show up in the same question. Asking about just monsters? SELECT id FROM monsters; is completely fine on its own, there's only one table around, so id was never confusing in the first place.
Typing monsters and shop_purchases out in full, over and over, gets tiring fast. SQL lets you give a table a short nickname, called an alias, right where you name it in FROM, using the word AS:
SELECT m.name, sp.item_name, sp.price_gold FROM monsters AS m, shop_purchases AS sp WHERE m.id = sp.monster_id;
That's the exact same query as Step 2, just shorter. monsters AS m means "call this table m from here on," so m.name means exactly what monsters.name meant before. Same trick on the other table: shop_purchases AS sp lets you write sp.item_name instead of spelling the whole name out every time.
AS is technically optional, monsters m works exactly like monsters AS m, but writing AS explicitly makes the query easier for someone else, including future you, to read at a glance.
That's the entire idea: give each table a short nickname if you want one, then use table.column (or nickname.column) any time you need to say exactly which column, in which table, you mean. Keep it in your back pocket, you'll reach for it constantly starting in the next chapter.
Your monsters table has names. Your shop_purchases table has what they bought. Neither one alone tells the whole story. A JOIN glues matching rows from two tables into one combined row, let's build it from real data, one piece at a time.
Same two tables from last chapter, now with some purchases in them:
INSERT INTO shop_purchases (monster_id, item_name, price_gold) VALUES (1, 'Nice Cream', 15.00), -- Sans (3, 'Spider Cider', 9.50), -- Undyne (3, 'Spider Donut', 6.00), -- Undyne (5, 'Butterscotch Pie', 22.00); -- Toriel
Notice Papyrus (2), Alphys (4), and Asgore (6) never show up as a monster_id, they haven't bought anything. Keep that in mind, it matters in Step 2. A plain JOIN connects the two tables using ON, matching wherever shop_purchases.monster_id equals monsters.id:
SELECT monsters.name, shop_purchases.item_name, shop_purchases.price_gold FROM monsters JOIN shop_purchases ON monsters.id = shop_purchases.monster_id;
Run that, and here's exactly what comes back:
| name | item_name | price_gold |
|---|---|---|
| Sans | Nice Cream | 15.00 |
| Undyne | Spider Cider | 9.50 |
| Undyne | Spider Donut | 6.00 |
| Toriel | Butterscotch Pie | 22.00 |
That's the whole idea of a plain JOIN (also called INNER JOIN): it only keeps rows with a match on both sides. Papyrus, Alphys, and Asgore had nothing in shop_purchases, so they're simply gone from the result, not shown with empty values, just absent entirely.
Forget the ON clause and SQL has no idea how to match the rows, so it pairs EVERY row in one table with EVERY row in the other, a cartesian product. Two tables of 6 and 4 rows would return 24 garbage rows instead of 4. Always include ON.
What if you want every monster to show up, purchases or not? Swap JOIN for LEFT JOIN. "Left" means the table named right after FROM, here, monsters:
SELECT monsters.name, shop_purchases.item_name FROM monsters LEFT JOIN shop_purchases ON monsters.id = shop_purchases.monster_id;
Same query, same tables, but a very different result. Every one of the 6 monsters is back, with NULL standing in wherever there's no matching purchase:
| name | item_name |
|---|---|
| Sans | Nice Cream |
| Papyrus | NULL |
| Undyne | Spider Cider |
| Undyne | Spider Donut |
| Alphys | NULL |
| Toriel | Butterscotch Pie |
| Asgore | NULL |
LEFT JOIN is the standard move any time the question is really "find the ones missing something", monsters with no purchases, players with no achievements, orders with no reviews. A plain JOIN would have silently dropped all three of them.
Remember the alias trick from last chapter? This is exactly where it earns its keep. The Step 1 query, written shorter, means the exact same thing:
SELECT m.name, sp.item_name, sp.price_gold FROM monsters AS m JOIN shop_purchases AS sp ON m.id = sp.monster_id; -- 'm' and 'sp' stand in for the full table names, same result as Step 1
Reading data is cool. Now let's actually mess with it. With great DETERMINATION comes great responsibility, some of these commands change or delete things permanently, with no LOAD button.
INSERT INTO monsters (name, phone_number, location) VALUES ('Papyrus', '555-0101', 'Snowdin'); -- Auto_increment id? Don't include it. SQL handles it.
INSERT INTO monsters (name, phone_number, location) VALUES ('Alphys', '555-0177', 'Hotland'), ('Asgore', '555-0199', 'New Home'), ('Toriel', '555-0111', 'Ruins'); -- three rows, three monsters, one statement
Columns and values must match in count, every time. Any column marked NOT NULL must be included with a real value.
UPDATE shop_purchases SET price_gold = 14.99 WHERE id = 42; -- relative to its own current value: UPDATE shop_purchases SET price_gold = price_gold * 1.10 WHERE item_name = 'Spider Cider'; -- a 10% price bump on every Spider Cider purchase, all at once
Forget WHERE on an UPDATE and every row in the entire table gets changed. There's no LOAD button. Always write WHERE.
DELETE FROM shop_purchases WHERE id = 42; -- removes exactly one purchase: the one with id 42
DELETE FROM shop_purchases; with no WHERE wipes every row in the table. Always double, even triple, check your WHERE clause. This isn't a RESET you can undo.
You made a table and realized you forgot a column. Or the name's wrong. Or requirements changed and the whole thing needs to go. No need to start a new file from scratch.
-- Add a new column: ALTER TABLE shop_purchases ADD discount DECIMAL(5,2); -- Remove a column (and all its data): ALTER TABLE shop_purchases DROP COLUMN discount; -- Rename a column: ALTER TABLE shop_purchases RENAME COLUMN discount TO discount_amount; -- Change a column's type: ALTER TABLE shop_purchases MODIFY COLUMN price_gold DECIMAL(10,2);
When you ADD a column, every existing row gets it too, there's no adding a column to "just the new rows." Existing rows get NULL there unless you specify a DEFAULT.
Dropping a column deletes all the data inside it, for every row, permanently. Check twice, drop once.
DROP TABLE shop_purchases; -- Safer: won't crash if the table doesn't exist DROP TABLE IF EXISTS shop_purchases;
You can't drop monsters while shop_purchases still points at it. Always go child table, then parent table.
An index is like a shortcut through Waterfall instead of the long way around. Without one, SQL has to check every single row to find what you're looking for.
CREATE INDEX idx_monster_id ON shop_purchases(monster_id); -- speeds up any query that filters or joins on monster_id
Index columns you frequently filter on (WHERE), join on (ON), or sort by (ORDER BY). Don't index everything just in case.
SQL is picky about order. Every clause has a fixed spot in the sentence and genuinely cannot move. Memorize the order below and roughly half of your beginner errors just disappear.
SELECT -- 1. what columns to show FROM -- 2. which table JOIN -- 3. link another table (optional) WHERE -- 4. filter ROWS (before grouping) GROUP BY -- 5. group them HAVING -- 6. filter GROUPS (after grouping) ORDER BY -- 7. sort the results LIMIT -- 8. cut it to N rows
You don't need all 8 every single time, most everyday queries only use 2 or 3 of these.
"Show me the top 10 most-purchased items by Snowdin monsters, but only ones bought more than 5 times." Read top to bottom and it's just every chapter of this guide, stacked in the one order SQL demands:
SELECT sp.item_name, COUNT(*) AS cnt FROM shop_purchases sp JOIN monsters m ON sp.monster_id = m.id WHERE m.location = 'Snowdin' GROUP BY sp.item_name HAVING COUNT(*) > 5 ORDER BY cnt DESC LIMIT 10;