Rust Time in SQLite - Text vs. Integer/UnixTime

Discord Thread about this post

When working with any database, there will always be a time when it comes to storing some date and time information. In most databases, there is typically a datetime type that can be used for this purpose.

However, SQLite has a relatively simple type system and does not support any native datetime type. Fortunately, the SQLite type system has good support for a set of primitive types, which provides a few options for storing date and time information effectively.

Here are the two top decisions to make when storing date and time in SQLite:

Which SQLite type to use among the available types:
- INTEGER (up to 8 bytes, so, can be i64; SQLite decides the appropriate size).
- REAL (8 bytes, so, can be f64).
- TEXT (by default UTF-8, virtually any size).
- BLOB (essentially a byte array, virtually any size).
In what format to store it:
- For TEXT, we would likely use an ISO 8601 profile such as RFC 3339.
- For INTEGER, we would typically use UnixTime with defined precision (UnixTime Epoch 0 starts on (UTC) on January 1, 1970.).
- For REAL, the Julian Date with f64 could be used, where the integer part represents the number of days since the initial epoch (noon Universal Time (UT) Monday, January 1, 4713 BC, in the Julian calendar), and the fractional part represents the fraction of a day since the preceding noon.
- For BLOB, we could store it as a fixed-size blob representing [YYYY,MM,DD,HH,mm,ss,sss]. This approach follows the binary format of the time library used, facilitating serialization and deserialization from and to the database.

Regardless of the types and format, a typical good practice is to always store time in UTC to avoid any confusion or overhead.

Now, among all those options, there are two top choices that work quite well: the TEXT/ISO 8601 option and the Unix UnixTime as Integer option. Regardless of which one we choose, it is usually a good idea to normalize all the time storage types/formats across the application code.

Below, we will go over these two options and explain our rationale for choosing Option 2) for our codebase, concluding with some Rust code examples for that choice.

Option 1) Type: `TEXT`, Format: `IS0 8601`

One common option is to store the date in TEXT type as a ISO 8601 profile such as RFC 3339 (e.g., 2024-02-10T19:26:35.884943Z).

Pros:

Simple: Easy to understand what's going in and out. Uses common formats.
Sortable: Many ISO 8601 formats preserve lexicographical order.
Human Readable: One advantage of this approach is that the storage format is fully descriptive and readable without any additional contextual information.
Common: Many libraries, rustqlite (with-time) use this format.

Cons:

Storage Cost: The string represention is not really space efficient. This might or might not be an issue. For example, a micro precision RFC3339 will take 26 bytes, whereas the integer on a micro second precision will fit in a i64 and therefore take 8 bytes.
Indexing Cost: TEXT data requires a character by character comparison which can be slower, especially for longer strings.
DataFrame Unfriendly: If, in the future, times are used in data frames for high-volume analytics, such as with Polars, then strings tend to be even more suboptimal, especially when they have a high uniqueness ratio.

Option 2) Type: `INTEGER`, Format: `UnixTime µs`

Another common approach is to store the date as an INTEGER representing Unix Time (Epoch time starts on January 1, 1970, 00:00:00 UTC) and decide on a precision.

Most databases do not really support u64 in their numeric types, including SQLite, which means we are limited to i64.

Fortunately, i64, even at microsecond precision, is more than sufficient to store time. The i64::MAX value will be approximately 292,277-01-09... UTC, which is practically infinite for our timescale. And since we also have negative numbers, we can go back in time quite far. But assuming we don't need to keep track of dinosaurs, we should be fine.

Pros:

Simple: This approach is also simple. In fact, from a data standpoint, it can be considered simpler because, aside from endianness (which is handled at a much lower level), no processing is needed to extract the value.
Storage Efficient: An i64 uses 8 bytes, making it more space-efficient than the 27 bytes or so required for the TEXT format.
Indexing Efficient: Comparison operations are faster and more straightforward because they involve direct numerical comparisons.
DataFrame Efficient: Numerical types like i64 are typically much more efficient for operations in data frames, facilitating quicker analytics and processing.

Cons:

Not Human Readable: Those long numbers are not really readable to the average person, requiring an extra step to display them to the user, even for an DevOps or Developer.
Need for Precision Context: Since Unix times can typically be in seconds, milliseconds, or microseconds, the precision context must be provided or agreed upon to make sense of the data.
Nanoseconds Limit: While it should not be an issue for most applications, nanoseconds precision would have an upper date limit of 2262-04-11....
Not as Common: Especially considering microseconds precision, this method is not a common way to store time in a database, and as a result, most database libraries do not support it.
- However, due to the simplicity of this format, being just a single i64, this is not really a problem, as it could be easily transferred to the client side in JSON, allowing the client to convert it to/from a human-readable date.

Other types / formats

While the float with Julian Date and the BLOB custom format can be considered and might be valid in some cases, they tend to be a bit more specialized and even less common than the two options mentioned above, without providing significant additional value.

Feel free to discuss in the Discord link at the top of the post if you have a different perspective.

Conclusion and Decision

The decision to choose one of the options above can be considered subjective and more of a preference, as both options will work well regardless of the choice.

That being said, for our coding approach, we think that the Option 2), (i.e. INTEGER / UnixTime µs) is the one that have the highest net value.

Here are the main reasons:

Efficiency: As discussed, from a database perspective, using this method offers peak efficiency for storage, indexing, and memory, whether in SQLite or DataFrame environments.
Simplicity: Although RFC 3339 is relatively simple, an i64 format is even more straightforward to manage. It can easily be encapsulated in a tuple struct UnixTimeUs, complete with appropriate dereferencing and serialization annotations. By standardizing our code to store all times with unix time at microsecond precision, both the code and all serialization/deserialization processes become very straightforward.
Human Readability: We tackle this issue by incorporating a renderer into our db().debug_print_rows(..) function, which displays those cells in RFC 3339 format for debugging purposes.
- It's important to note that, for displaying data to end-users, this data will still need to be reformatted even if the dates were stored in RFC 3339. Thus, starting with a number format is both faster and simpler.

Due to the reasons mentioned above, Option 2 shifts our decision reasonning from "Why?" to "Why not?". This means we would need to identify compelling reasons not to use it, rather than justifying its use over other options.

That being said, different architectural and code design approaches necessitate different decisions. Therefore, we are not suggesting that this is the universally best choice for all scenarios. Follow what you and your team deem appropriate for your project, as both options are valid and can scale relatively well.

Side Note:

In our Rust code, we like to wrap those primitive type i64 into a single tuple struct to give them a little bit more sementic in our application code. So, we have a type UnixTimeUs (Us for micro seconds) as

#[derive(Clone, Copy, PartialEq, Eq, Debug, Deref, From, Into
          Field, FromSqliteValue,Serialize, Deserialize)]
#[cfg_attr(feature = "for-ts", derive(schemars::JsonSchema))]
pub struct UnixTimeUs(i64);

And this is allowing us to have:

pub struct MyData {
    // ...

    pub ctime: UnixTimeUs, // creation time.
	pub mtime: UnixTimeUs, // last modification time.
}

What about PostgreSQL?

PostgreSQL does have a datetime type, so we typically use this type (except when we have a sync from an SQLite database ongoing).

Personal Note: That being said, the end-to-end simplicity of the UnixTime µs / i64 format is quite appealing to use even in the case of PostgreSQL. And while it might seem "wrong" to not use the native PostgreSQL datetime type, I am not 100% sure it would be a bad design choice.

Note: There is a slight nuance in PostgreSQL and Rust SQLx where if we do not define the time type to include a timezone, we might encounter serialization/deserialization issues from the database, even though PostgreSQL recommends having datetime without a timestamp. This is a minor issue and does not affect the decision to use the native PostgreSQL datetime type.

Rust Code Snippets for generating/parsing `UnixTime µs`

Here are some code snippets we are using to generate our UnixTime µs.

Note: The RFC 3339 code example is just that—an example. Typically, we stream the i64 Unix Time format directly to the client and handle user time formatting on the client side (e.g., with JS/TS in the case of a web browser).

`now_unix_time_us` - Current time in microsecond precision

use std::time::{Duration, SystemTime, UNIX_EPOCH};

/// Returns the Unix Time in microseconds.
///
/// Note 1: If there is any error with `duration_since UNIX_EPOCH` (which should almost never happen),
///         it returns the start of the EPOCH.
/// Note 2: The maximum UTC epoch date that can be stored in i64 with microseconds precision
///         would be approximately `292277-01-09 ... UTC`.
///         Thus, for all practical purposes, it is sufficiently distant to be of no concern.
pub fn now_unix_time_us() -> i64 {
    let now = SystemTime::now();
    let since_the_epoch = now.duration_since(UNIX_EPOCH).unwrap_or(Duration::new(0, 0));

    since_the_epoch.as_micros().min(i64::MAX as u128) as i64
}

`unix_time_us_to_rfc3339` - UnixTime microseconds to Rfc3339 String

/// Returns the RFC3339 format of an Unix Time in microseconds.
///
/// Note 1: If anything fails, it will return the start of the EPOCH formatted as RFC3339 "1970-01-01T00:00:00Z".
///         Other strategies can be valid.
pub fn unix_time_us_to_rfc3339(unix_time_us: i64) -> String {
	let unix_time_us: i128 = unix_time_us as i128 * 1000;

	// In the unlikely event of a failure, return the start of the EPOCH.
	let time = OffsetDateTime::from_unix_timestamp_nanos(unix_time_us).unwrap_or(OffsetDateTime::UNIX_EPOCH);

	// This should probably never fail.
	time.format(&Rfc3339).unwrap_or_else(|_| "1970-01-01T00:00:00Z".to_string())
}

`rfc3339_to_unix_time_us` - Rfc3339 String to UnixTime microseconds

pub type Result<T> = core::result::Result<T, Error>;
pub type Error = Box<dyn std::error::Error>; // For early dev.

use time::format_description::well_known::Rfc3339;
use time::OffsetDateTime;

/// Returns the Unix Time in microseconds for a date-time string formatted in RFC3339.
///
/// Note: Given the higher probability of error with this function,
///       it will return a `Result<T>`.
///       Other strategies may also be valid.
pub fn rfc3339_to_unix_time_us(time_rfc3339: &str) -> Result<i64> {
    let time =
        OffsetDateTime::parse(time_rfc3339, &Rfc3339).map_err(|_| format!("Fail to parse: {}", time_rfc3339))?;
    let time_ms = time.unix_timestamp_nanos() / 1000;
    let time_ms = time_ms.min(i64::MAX as i128) as i64;
    Ok(time_ms)
}

Discord Thread about this post

Rust Blueprints

Best Practices & Patterns

Rust Language

Others

Rust Time in SQLite - Text vs. Integer/UnixTime

Option 1) Type: `TEXT`, Format: `IS0 8601`

Option 2) Type: `INTEGER`, Format: `UnixTime µs`

Other types / formats

Conclusion and Decision

What about PostgreSQL?

Rust Code Snippets for generating/parsing `UnixTime µs`

`now_unix_time_us` - Current time in microsecond precision

`unix_time_us_to_rfc3339` - UnixTime microseconds to Rfc3339 String

`rfc3339_to_unix_time_us` - Rfc3339 String to UnixTime microseconds

Rust Blueprints

Best Practices & Patterns

Rust Language

Others

Option 1) Type: TEXT, Format: IS0 8601

Option 2) Type: INTEGER, Format: UnixTime µs

Other types / formats

Conclusion and Decision

What about PostgreSQL?

Rust Code Snippets for generating/parsing UnixTime µs

now_unix_time_us - Current time in microsecond precision

unix_time_us_to_rfc3339 - UnixTime microseconds to Rfc3339 String

rfc3339_to_unix_time_us - Rfc3339 String to UnixTime microseconds

Option 1) Type: `TEXT`, Format: `IS0 8601`

Option 2) Type: `INTEGER`, Format: `UnixTime µs`

Rust Code Snippets for generating/parsing `UnixTime µs`

`now_unix_time_us` - Current time in microsecond precision

`unix_time_us_to_rfc3339` - UnixTime microseconds to Rfc3339 String

`rfc3339_to_unix_time_us` - Rfc3339 String to UnixTime microseconds