Skip to main content

Step-by-Step MariaDB

Step-by-Step MariaDB

Setting up a database:

  1. Create a database
  2. Create a user
  3. Create a table
  4. Create a column
  5. Create indexes
  6. Add foreign keys

Using the database:

  1. INSERT
  2. SELECT
  3. UPDATE
  4. DELETE

Create a Column

MariaDB Statement

ALTER TABLE tbl_name
ADD col_name data_type NOT NULL;

Where tbl_name is the name of the table, col_name is the name of the column and data_type is the data type as described below.

If one wish to allow NULL values, one has to remove the NOT keyword. It is desirable to not use NULL values if not necessary, as they take up storage space (where an empty values doesn't).

phpMyAdmin

Data Types

  1. Integer
  2. Date
  3. String
    1. ENUM and SET
  4. Real
    1. Precision
      1. Precision Error
    2. Range
    3. Accuracy
    4. Other Solutions
      1. Storing Decimal as Integer
      2. Exact Real
      3. 2-D Positioning
  5. Array
    1. Reference Table
    2. Bitwise Technique
    3. JSON
Storage
Requirement
(bytes)
Integer Real Date String 11
1 BOOLEAN DECIMAL(1, [0-1]) YEAR CHAR(1) 12
TINYINT DECIMAL(2, [even]) 1 VARCHAR([1-255])
TINYTEXT
2 SMALLINT "EXACT REAL(2)" 2 CHAR(2) 12
DECIMAL(3, [0-3]) VARCHAR([256-16 383]) 13
DECIMAL(4, [even]) 3 TEXT
3 MEDIUMINT "EXACT REAL(4)" 2 DATE CHAR(3) 12
DECIMAL(5, [0-5]) 4 TIME MEDIUMTEXT
DECIMAL(6, [even]) 5
4 INTEGER "EXACT REAL(6)" 2 TIMESTAMP CHAR(4) 12
DECIMAL(7, [0-7]) 6 LONGTEXT
DECIMAL(8, [even]) 7
DECIMAL(9, [0,9]) 8 JSON14
FLOAT
5 "EXACT REAL(9)" 2 CHAR(5) 12
DECIMAL(10, [0-2,4,6,8-10]) 9
DECIMAL(11, [0,2,9,11])
6 "EXACT REAL(11)" 2 CHAR(6) 12
DECIMAL(10, [3,5,7]) 9
DECIMAL(11, [1,3-8,10])
DECIMAL(12, [0-4,6,8-12])
DECIMAL(13, [0,2,4,9,11,13])
7 "EXACT REAL(13)" 2 CHAR(7) 12
DECIMAL(12, [5,7])
DECIMAL(13, [1,3,5-8,10,12])
DECIMAL(14, [0-6,8-14])
DECIMAL(15, [0,2,4,6,9,11,13,15])
8 BIGINT "EXACT REAL(15)" 2 DATETIME CHAR(8) 12
DECIMAL(14, 7)
DECIMAL(15, [1,3,5,7,8,10,12,14])
DECIMAL(16, [0-16])
DECIMAL(17, [0,2,4,6,8,9,11,13,15,17]) 10
DECIMAL(18, [0,9,18]) 10
DOUBLE
25 POINT 15
NOTES:
  1. Prefer TINYINT for [odd] values; otherwise add 1 byte
  2. EXACT REAL is not an official data type. See below for details.
  3. Prefer SMALLINT for [odd] values; otherwise add 1 byte
  4. Prefer SMALLINT for values where MAX - MIN <= 65 535 X 10-D (2 bytes only)
  5. Prefer MEDIUMINT for [odd] values; otherwise add 1 byte
  6. Prefer scaled MEDIUMINT (3 bytes only)
  7. Prefer INTEGER for [odd] values; otherwise add 1 byte
  8. Prefer INTEGER for values [1-8]; otherwise add 1 byte
  9. Prefer INTEGER for values where MAX - MIN <= 4 294 967 295 X 10-D (4 bytes only)
  10. Prefer BIGINT for other values; otherwise add 1 byte
  11. Except for CHAR, storage requirement represents the minimum (must add the actual string bytes)
  12. Assuming use of character set with 1 byte per character (example ascii).
    utf8mb4 has 4 bytes/character (so multiply by 4 the storage requirement).
  13. Assuming use of character set utf8mb4 and a row size limit of 65 535 bytes.
    Longer strings will be stored in MEDIUMTEXT.
  14. Alias for LONGTEXT
  15. Actually stores 2 data types, each as DOUBLE

Unless there is a very specific need, you probably need a data type taking less than 5 bytes. Higher than 8 bytes, it is most likely too much and it is a good indicator that there is probably another solution better suited for your needs.

Integer

Always prefer an INTEGER type for best performance.

Date

If you need a DATETIME (8 bytes), you may prefer a DATE and TIME separately (3 bytes each) and a separate VIRTUAL column for DATETIME (no storage), like so:

ALTER TABLE table_name
ADD date_name DATE NOT NULL,
ADD time_name TIME NOT NULL AFTER date_name,
ADD datetime_name DATETIME AS (ADDTIME(date_name, time_name)) VIRTUAL AFTER time_name;

In addition to the storage gain, this may also help create better indexes for some queries based on time or date alone.

TIMESTAMP is generally used to track record changes (INSERT & UPDATE). Prefer DATETIME to store specific values (a birthday or a hiring date, for example).

String

If you don't need to search within a string, or use it for ordering or grouping, you might prefer using an integer referring to a list in a pre-compiled code instead of storing it into the database. With the same reasoning, you shouldn't use the database for internalization either; There are other methods to do this that most likely offer better advantages.

Prefer CHAR when the string length is always the same and when you can use a character set with 1 byte per character (a serial number for example).

Prefer VARCHAR for word-like strings that make sense as a whole (a username or a URL, for example) and TEXT data types for sentences that you might need to parse for specific words (a comment or an article for example). Technically, VARCHAR takes more memory than TEXT data types when in use, but is faster because there is less overhead.

TEXT data types also make more sense if the strings are only stored and retrieved (i.e. they never appear in WHERE, GROUP BY or ORDER BY clauses).

Try not to go over a length of 255 for VARCHAR, such that only 1 byte is used for storing the string length (2 bytes otherwise). Anyway, there are very few «word-like» strings above 255 characters, it is thus a good indicator that a TEXT data type might be better suited.

ENUM and SET

When you look at the MariaDB (or MySQL) documentation, you will note two additional string data types that were not specified in the above table, namely ENUM and SET. There is a good reason for it: Don't use them! Why? Because they are not string data types, they are INTEGERs, and even MariaDB treats them as such. You can see them as constant, as defined in other programming languages - for example, const int STRING1 = 1; or final int string1 = 1;. You wouldn't refer to «string1» anywhere other than within your code, because «string1» is a constant name, not data. If you need an INTEGER, use an INTEGER. The only thing ENUM and SET add is a correspondent string value to the integer, which is something you can do elsewhere, perhaps in another table, that may or may not be used with a JOIN and linked with a FOREIGN KEY. But if you define it outside the database, at the very least, adding a comment within the database is a good idea to keep track of things.

The way to use an INTEGER as an ENUM is simply to make a one-to-one correspondence with each string (1 => "string1"; 2 => "string2"; ...). And for a SET, it is only storing an array of INTEGERs (See below for arrays). MariaDB actually uses the bitwise technique for SET.

Anyway, a program well designed would only care for a «state» (represented by the integer value) and would only seek for a «human readable» value in its last step, when presenting data. Manipulating the «human readable» value for the different operations is not a good idea and often pointless, especially when all humans may not use the same language (now or in the future).

Real

Integers are great, but they have limited range and precision, and they don't store real values. There are basically 3 data types that can take care of this: FLOAT, DOUBLE and DECIMAL. DECIMAL(M, D) is declared with 2 numbers: a precision (M) - the quantity of significant digits in a number - and a decimal (D) - i.e. the quantity of digits after the decimal point. As one can see in the table above, choosing the wrong DECIMAL can have a great impact on the storage requirement.

Precision

Let us show what precision represents. 123.45 has 5 significant digits, 12340000 has 4 significant digits and 0.0123456 has 6 significant digits. Basically, it means that the numbers could represent exact values like 123.4500258 or 12339021.243 or 0.0123456481, but the measuring instruments couldn't be more precise than 4, 5 or 6 digits.

Precision error

It is easy to estimate the maximum error on a number with a given precision. The error is always ±0.5 on the last digit. Thus, for any value of x:

PrecisionMinimum NumberMaximum ErrorMaximum Precision Error
11e10x0.5e10x50 %
21.0e10x0.05e10x5 %
31.00e10x0.005e10x0.5 %
41.000e10x0.0005e10x0.05 %
51.0000e10x0.00005e10x0.005 %
61.00000e10x0.000005e10x0.0005 %
71.000000e10x0.0000005e10x0.00005 %
101.000000000e10x0.00000000005e10x5e10-8 %
151.000000000000000e10x0.0000000000000005e10x5e10-13 %
181.000000000000000000e10x0.0000000000000000005e10x5e10-16 %
n1e10x0.5e10(x-n)5e10(2-n) %

Ordinary measuring instruments give a precision of about 7 digits (a clock that can loose 1 second every year, for example). The most precise measuring instruments made by mankind can go to about an 18-digit precision (an atomic clock that looses 1 second every 14 billion years, for example). Expecting needing a higher precision is foolish.

That being said, FLOAT can provide 7-digit precision (4 bytes), DOUBLE can provide 15-digit precision (8 bytes), and DECIMAL can go as high as 65 digits (29-30 bytes). But as seen in the table above, with only 8 bytes of storage space, you can use 16-digit precision DECIMAL (even, up to 18 with certain decimal values). Since the storage requirement of a DECIMAL is closely related to its precision (M), anything above 18 or so is a waste of storage memory.

Also, with FLOAT and DOUBLE data types, you get either a 7 or 15 -digit precision, with 4 or 8 bytes storage requirement, with nothing in between. DECIMAL can fill that void.

Range

The other limit for a real number is the range.

FLOAT can store values from -3.402823466E+38 to -1.175494351E-38, 0 and from 1.175494351E-38 to 3.402823466E+38.

DOUBLE can store values from -1.7976931348623157E+308 to -2.2250738585072014E-308, 0 and from 2.2250738585072014E-308 to 1.7976931348623157E+308.

For DECIMAL, the range is less wide and depends on the declared precision and decimal. For example, a DECIMAL(7,3) would have a range between -9999.999 and 9999.999. You can get a range similar to FLOAT by using DECIMAL(38,0) for values greater than 1 or DECIMAL(38,38) for values less than 1. To cover approximately the full range of FLOAT, you need something like DECIMAL(65,32).

Accuracy

FLOAT and DOUBLE seem to be the best choices because of the extended range with no more memory needed than with an equivalent DECIMAL. However, they have a serious problem with accuracy. Even if you can count on them to have 7 or 15 digits of precision, after some calculations, they may show more random digits than can cause unexpected behavior. For example, multiplying 2 FLOATs with value of 0.1 together to find a value x, you will end up with 0.010000000298023226 as a result (8-digit precision). If you decide to use this result in (x > 0.01), you will get TRUE, even though the expected result is FALSE. DOUBLE will offer a greater precision, but the result will still not be accurate.

On the other hand, DECIMAL offers perfect accuracy, over its range and precision.

Other solutions

Storing Decimal as Integer

You can sometime save storage space by using integers instead of decimals. For example, say you want to store numbers that will range between 0.00000 and 40000.00000. The obvious choice seems to be DECIMAL(10, 5), which requires 6 bytes of storage space. However, you can multiply your number by 100 000 and store it in an INTEGER UNSIGNED instead, which requires only 4 bytes of memory. The drawback is a slower response as you need to do calculations every time you insert or retrieve a value. The most practical way is by using a VIRTUAL column, which will not be stored:

ALTER TABLE table_name
ADD integer_name INT UNSIGNED NOT NULL,
ADD decimal_name DECIMAL(10,5) AS (integer_name / 100000) VIRTUAL AFTER integer_name;

So you will find the correct value in the VIRTUAL column.

Exact Real

Similarly, you can store an «EXACT REAL» by storing the significand and the exponent of a floating number, and use a VIRTUAL column as DECIMAL. To store something similar to a FLOAT, one could do the following:

ALTER TABLE table_name
ADD significand_name DECIMAL(7,0) NOT NULL,
ADD exponent_name TINYINT NOT NULL AFTER significand_name,
ADD exact_real_name DECIMAL(65,32)
    AS (
        -- Convert the base and the exponent to DECIMAL 
        IF(exponent_name >= 0,
            -- Add zeros as necessary to string '1' and convert to DECIMAL
            CAST(CONCAT('1',REPEAT('0',exponent_name)) AS DECIMAL(65,0)),
            -- Add zeros as necessary to string decimal and convert to DECIMAL
            CAST(CONCAT('0.',REPEAT('0',ABS(exponent_name) - 1),'1') as DECIMAL(38,38))
        )
        *  -- Multiply by the significand (converted to DECIMAL)
        CAST(significand_name as DECIMAL(7,0))
    ) VIRTUAL AFTER exponent_name;

This will give a range of -9.999 999e32 to -1e-32, 0 and 1e-32 to 9.999 999e32, with only 5 bytes of storage space. But, as oppose to FLOAT, if you multiply 2 of these EXACT REALs with value of 0.1 together, you will end up with 0.01000000000000000000000000000000000000 instead of 0.010000000298023226.

Do not use the built-in function POW() to do the transformation as it will return an inaccurate double precision number. The values used in the transformation must be DECIMALs such that the result is also a DECIMAL.

Here's what a function in PHP might look like to get the significand and the exponent from a number to fit in the columns of our example:

$number = 123456789;

$exponent = floor(log10(abs($number))) - 6;
$significand = $number / pow(10, $exponent);

// $significand = 1234567.89
// $exponent = 2;

// $significand will be automatically rounded
// to nearest DECIMAL(7,0) by MariaDB upon INSERT or UPDATE
2-D Positioning

Storing latitude and longitude coordinates can easily be done with 7-digit precision INTEGERs or DECIMAL data types.

Unfortunately, these data types can only be used with 1-D indexes, namely B-tree data structure. You are much better off with SPATIAL INDEX, which uses a R-tree data structure, specifically designed for sorting multi-dimensional data. The data type needed for this is POINT which stores both variables with double precision (and some other information), which leads to a 25 bytes storage requirement. That is the trade-off you have to make to be able to use the right tools.

Array

There are no «array» types in MariaDB. Here are the main methods to represent arrays.

Reference Table

The most versatile way is to store the values in another table instead of a column. For example, say you have a table person like this one:

person
person_id first_name last_name date_of_birth
22 John Smith 1970-jan-01
36 Jane Doe 1977-sep-12

Now, you would like to add a column pet to represents the pets owned by each person. Because each person can have more than one pet, it is preferable to create a table pet instead and a reference table person_pet linking both tables together:

pet
pet_id name race
1 Mittens cat
2 Fido dog
3 Buster dog
person_pet
person_id pet_id
22 1
22 3
36 2

By appropriately joining the tables (for example, person JOIN person_pet USING(person_id) JOIN pet USING(pet_id)),you can get all pets with their owner's information.

If pet would not be used anywhere else than with table person, than all the columns found in table pet can be put into person_pet (except pet_id, which is not required anymore), to eliminate one table and one JOIN operation.

It is good practice to name the reference table with the «main» table name as the prefix. person_pet means «pet is an array "column" for table person». That way, when listed alphabetically, all reference tables will be following their respective «main» table (in a program like phpMyAdmin or the like, for example). It is a lot easier to follow, especially when you have many reference tables linked to a single «main» table.

It is also good practice to name the ID column with the table name as the prefix, as shown above. This makes things a lot easier to follow when using the ID in more than one table. This also let us use the form table_1 JOIN table_2 USING(table_1_id) where MariaDB knows that table_1_id found in both tables refers to the same value and only returns one copy, if asked to. You can even use the simpler form table_1 NATURAL JOIN table_2, where MariaDB looks automatically for the column names (there can be more than one, so be careful) shared by both tables.

Bitwise Technique

If you have multiple choices between a given set of values (equivalent to checkboxes within a form), you can use this technique that only uses a single INTEGER column, thus very efficient speed- and memory-wise. This is the method used with the SET data type.

Logically link an integer with each possible value, say a string, but only with numbers containing only one bit set to "1" in binary form:

IntegerBinaryValue
1 ( = 20 )00000001"string1"
2 ( = 21 )00000010"string2"
4 ( = 22 )00000100"string3"
8 ( = 23 )00001000"string4"
.........

That way, the integer 00000101 (= 5) actually means "string1" and "string3" are selected. Of course, with BIGINT - with its maximum value of (264 - 1) - there are only 64 possible choices.

You can then use bit functions to look for desired values.

JSON

There are JSON functions available in MariaDB to manipulate JSON objects. So if you store a valid JSON object into any string data type (JSON data type exists for compatibility with MySQL, but it really is a LONGTEXT), you can use those functions to manipulate the data.

But remember that it might not extend as easily as reference tables do, if you wish to link data stored in JSON objects to other tables in the future.

References

MariaDB

MySQL