
Rust Time in SQLite - Text vs. Integer/UnixTime
When working with any database, there will always be a time when it comes to storing some date and time information. In most databases, there is typically a datetime
type that can be used for this purpose.
However, SQLite has a relatively simple type system and does not support any native datetime type. Fortunately, the SQLite type system has good support for a set of primitive types, which provides a few options for storing date and time information effectively.
Here are the two top decisions to make when storing date and time in SQLite:
-
Which SQLite type to use among the available types:
INTEGER
(up to 8 bytes, so, can bei64
; SQLite decides the appropriate size).REAL
(8 bytes, so, can bef64
).TEXT
(by defaultUTF-8
, virtually any size).BLOB
(essentially a byte array, virtually any size).
-
In what format to store it:
- For
TEXT
, we would likely use an ISO 8601 profile such asRFC 3339
. - For
INTEGER
, we would typically use UnixTime with defined precision (UnixTime Epoch0
starts on(UTC) on January 1, 1970.
). - For
REAL
, the Julian Date withf64
could be used, where the integer part represents the number of days since the initial epoch (noon Universal Time (UT) Monday, January 1, 4713 BC, in the Julian calendar), and the fractional part represents the fraction of a day since the preceding noon. - For
BLOB
, we could store it as a fixed-size blob representing[YYYY,MM,DD,HH,mm,ss,sss]
. This approach follows the binary format of the time library used, facilitating serialization and deserialization from and to the database.
- For
Regardless of the types and format, a typical good practice is to always store time in UTC to avoid any confusion or overhead.
Now, among all those options, there are two top choices that work quite well: the TEXT/ISO 8601 option and the Unix UnixTime as Integer option. Regardless of which one we choose, it is usually a good idea to normalize all the time storage types/formats across the application code.
Below, we will go over these two options and explain our rationale for choosing Option 2) for our codebase, concluding with some Rust code examples for that choice.
Option 1) Type: TEXT
,
Format: IS0 8601
One common option is to store the date in TEXT
type as a ISO 8601 profile such as RFC 3339
(e.g., 2024-02-10T19:26:35.884943Z
).
Pros:
- Simple: Easy to understand what's going in and out. Uses common formats.
- Sortable: Many ISO 8601 formats preserve lexicographical order.
- Human Readable: One advantage of this approach is that the storage format is fully descriptive and readable without any additional contextual information.
- Common: Many libraries,
rustqlite (with-time)
use this format.
Cons:
- Storage Cost: The string represention is not really space efficient. This might or might not be an issue. For example, a micro precision RFC3339 will take
26 bytes
, whereas theinteger
on a micro second precision will fit in a i64 and therefore take8 bytes
. - Indexing Cost: TEXT data requires a character by character comparison which can be slower, especially for longer strings.
- DataFrame Unfriendly: If, in the future, times are used in data frames for high-volume analytics, such as with Polars, then strings tend to be even more suboptimal, especially when they have a high uniqueness ratio.
Option 2) Type: INTEGER
,
Format: UnixTime µs
Another common approach is to store the date as an INTEGER
representing Unix Time (Epoch time starts on January 1, 1970, 00:00:00 UTC) and decide on a precision.
Most databases do not really support u64
in their numeric types, including SQLite, which means we are limited to i64
.
Fortunately, i64
, even at microsecond precision, is more than sufficient to store time. The i64::MAX
value will be approximately 292,277-01-09... UTC
, which is practically infinite for our timescale. And since we also have negative numbers, we can go back in time quite far. But assuming we don't need to keep track of dinosaurs, we should be fine.
Pros:
- Simple: This approach is also simple. In fact, from a data standpoint, it can be considered simpler because, aside from endianness (which is handled at a much lower level), no processing is needed to extract the value.
- Storage Efficient: An
i64
uses8 bytes
, making it more space-efficient than the27 bytes
or so required for the TEXT format. - Indexing Efficient: Comparison operations are faster and more straightforward because they involve direct numerical comparisons.
- DataFrame Efficient: Numerical types like
i64
are typically much more efficient for operations in data frames, facilitating quicker analytics and processing.
Cons:
- Not Human Readable: Those long numbers are not really readable to the average person, requiring an extra step to display them to the user, even for an DevOps or Developer.
- Need for Precision Context: Since Unix times can typically be in seconds, milliseconds, or microseconds, the precision context must be provided or agreed upon to make sense of the data.
- Nanoseconds Limit: While it should not be an issue for most applications, nanoseconds precision would have an upper date limit of
2262-04-11...
. - Not as Common: Especially considering microseconds precision, this method is not a common way to store time in a database, and as a result, most database libraries do not support it.
- However, due to the simplicity of this format, being just a single
i64
, this is not really a problem, as it could be easily transferred to the client side in JSON, allowing the client to convert it to/from a human-readable date.
- However, due to the simplicity of this format, being just a single
Other types / formats
While the float with Julian Date and the BLOB custom format can be considered and might be valid in some cases, they tend to be a bit more specialized and even less common than the two options mentioned above, without providing significant additional value.
Feel free to discuss in the Discord link at the top of the post if you have a different perspective.
Conclusion and Decision
The decision to choose one of the options above can be considered subjective and more of a preference, as both options will work well regardless of the choice.
That being said, for our coding approach, we think that the Option 2), (i.e. INTEGER / UnixTime µs
) is the one that have the highest net value.
Here are the main reasons:
- Efficiency: As discussed, from a database perspective, using this method offers peak efficiency for storage, indexing, and memory, whether in SQLite or DataFrame environments.
- Simplicity: Although
RFC 3339
is relatively simple, ani64
format is even more straightforward to manage. It can easily be encapsulated in a tuple structUnixTimeUs
, complete with appropriate dereferencing and serialization annotations. By standardizing our code to store all times with unix time at microsecond precision, both the code and all serialization/deserialization processes become very straightforward. - Human Readability: We tackle this issue by incorporating a renderer into our
db().debug_print_rows(..)
function, which displays those cells in RFC 3339 format for debugging purposes.- It's important to note that, for displaying data to end-users, this data will still need to be reformatted even if the dates were stored in
RFC 3339
. Thus, starting with a number format is both faster and simpler.
- It's important to note that, for displaying data to end-users, this data will still need to be reformatted even if the dates were stored in
Due to the reasons mentioned above, Option 2 shifts our decision reasonning from "Why?" to "Why not?". This means we would need to identify compelling reasons not to use it, rather than justifying its use over other options.
That being said, different architectural and code design approaches necessitate different decisions. Therefore, we are not suggesting that this is the universally best choice for all scenarios. Follow what you and your team deem appropriate for your project, as both options are valid and can scale relatively well.
Side Note:
In our Rust code, we like to wrap those primitive type i64
into a single tuple struct to give them a little bit more sementic in our application code. So, we have a type UnixTimeUs
(Us
for micro seconds) as
#[derive(Clone, Copy, PartialEq, Eq, Debug, Deref, From, Into
Field, FromSqliteValue,Serialize, Deserialize)]
#[cfg_attr(feature = "for-ts", derive(schemars::JsonSchema))]
pub struct UnixTimeUs(i64);
And this is allowing us to have:
pub struct MyData {
// ...
pub ctime: UnixTimeUs, // creation time.
pub mtime: UnixTimeUs, // last modification time.
}
What about PostgreSQL?
PostgreSQL does have a datetime
type, so we typically use this type (except when we have a sync from an SQLite database ongoing).
Personal Note: That being said, the end-to-end simplicity of the
UnixTime µs / i64
format is quite appealing to use even in the case of PostgreSQL. And while it might seem "wrong" to not use the native PostgreSQLdatetime
type, I am not 100% sure it would be a bad design choice.
Note: There is a slight nuance in
PostgreSQL
andRust SQLx
where if we do not define the time type to include a timezone, we might encounter serialization/deserialization issues from the database, even though PostgreSQL recommends having datetime without a timestamp. This is a minor issue and does not affect the decision to use the native PostgreSQL datetime type.
Rust Code Snippets for generating/parsing UnixTime µs
Here are some code snippets we are using to generate our UnixTime µs
.
Note: The
RFC 3339
code example is just that—an example. Typically, we stream thei64
Unix Time format directly to the client and handle user time formatting on the client side (e.g., with JS/TS in the case of a web browser).
now_unix_time_us
- Current time in microsecond precision
use std::time::{Duration, SystemTime, UNIX_EPOCH};
/// Returns the Unix Time in microseconds.
///
/// Note 1: If there is any error with `duration_since UNIX_EPOCH` (which should almost never happen),
/// it returns the start of the EPOCH.
/// Note 2: The maximum UTC epoch date that can be stored in i64 with microseconds precision
/// would be approximately `292277-01-09 ... UTC`.
/// Thus, for all practical purposes, it is sufficiently distant to be of no concern.
pub fn now_unix_time_us() -> i64 {
let now = SystemTime::now();
let since_the_epoch = now.duration_since(UNIX_EPOCH).unwrap_or(Duration::new(0, 0));
since_the_epoch.as_micros().min(i64::MAX as u128) as i64
}
unix_time_us_to_rfc3339
- UnixTime microseconds to Rfc3339 String
/// Returns the RFC3339 format of an Unix Time in microseconds.
///
/// Note 1: If anything fails, it will return the start of the EPOCH formatted as RFC3339 "1970-01-01T00:00:00Z".
/// Other strategies can be valid.
pub fn unix_time_us_to_rfc3339(unix_time_us: i64) -> String {
let unix_time_us: i128 = unix_time_us as i128 * 1000;
// In the unlikely event of a failure, return the start of the EPOCH.
let time = OffsetDateTime::from_unix_timestamp_nanos(unix_time_us).unwrap_or(OffsetDateTime::UNIX_EPOCH);
// This should probably never fail.
time.format(&Rfc3339).unwrap_or_else(|_| "1970-01-01T00:00:00Z".to_string())
}
rfc3339_to_unix_time_us
- Rfc3339 String to UnixTime microseconds
pub type Result<T> = core::result::Result<T, Error>;
pub type Error = Box<dyn std::error::Error>; // For early dev.
use time::format_description::well_known::Rfc3339;
use time::OffsetDateTime;
/// Returns the Unix Time in microseconds for a date-time string formatted in RFC3339.
///
/// Note: Given the higher probability of error with this function,
/// it will return a `Result<T>`.
/// Other strategies may also be valid.
pub fn rfc3339_to_unix_time_us(time_rfc3339: &str) -> Result<i64> {
let time =
OffsetDateTime::parse(time_rfc3339, &Rfc3339).map_err(|_| format!("Fail to parse: {}", time_rfc3339))?;
let time_ms = time.unix_timestamp_nanos() / 1000;
let time_ms = time_ms.min(i64::MAX as i128) as i64;
Ok(time_ms)
}