sqldatamodel

SQLDataModel

class sqldatamodel.sqldatamodel.SQLDataModel(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = 'default')[source]

Bases: object

SQLDataModel

Primary class for the package of the same name. Its meant to provide a fast & light-weight alternative to the common pandas, numpy and sqlalchemy setup for moving data in a source/destination agnostic manner. It is not an ORM, any modifications outside of basic joins, group bys and table alteration requires knowledge of SQL. The primary use-case envisaged by the package is one where a table needs to be ETL’d from location A to destination B with arbitrary modifications made if needed:

Summary

  • Extract your data from SQL, websites or HTML, parquet, JSON, CSV, pandas, numpy, pickle, python dictionaries, lists, etc.

  • Transform your data using raw SQL or any number of built-in methods covering some of the most used pandas data methods.

  • Load your data to any number of sources including popular SQL databases, CSV files, JSON, HTML, parquet, pickle, etc.

Usage

import sqldatamodel as sdm

# Lets grab a random table from Wikipedia
df = sdm.from_html("https://en.wikipedia.org/wiki/FIFA_World_Cup", table_identifier=7)

# Lets see what we found
print(df)

This will output:

┌───────────────┬──────┬──────┬──────────┬──────────┬──────┬──────┬───────┐
│ Confederation │  AFC │  CAF │ CONCACAF │ CONMEBOL │  OFC │ UEFA │ Total │
├───────────────┼──────┼──────┼──────────┼──────────┼──────┼──────┼───────┤
│ Teams         │   43 │   49 │       46 │       89 │    4 │  258 │   489 │
│ Top 16        │    9 │   11 │       15 │       37 │    1 │   99 │   172 │
│ Top 8         │    2 │    4 │        5 │       36 │    0 │  105 │   152 │
│ Top 4         │    1 │    1 │        1 │       23 │    0 │   62 │    88 │
│ Top 2         │    0 │    0 │        0 │       15 │    0 │   29 │    44 │
│ 4th           │    1 │    1 │        0 │        5 │    0 │   15 │    22 │
│ 3rd           │    0 │    0 │        1 │        3 │    0 │   18 │    22 │
│ 2nd           │    0 │    0 │        0 │        5 │    0 │   17 │    22 │
│ 1st           │    0 │    0 │        0 │       10 │    0 │   12 │    22 │
└───────────────┴──────┴──────┴──────────┴──────────┴──────┴──────┴───────┘
[9 rows x 8 columns]

Example:

import sqldatamodel as sdm

# For example, setup a source connection
source_db_conn = pyodbc.connect(...)

# A destination connection
destination_db_conn = sqlite3.connect(...)

# Grab your source table
df = sdm.from_sql("select * from source_table", source_db_conn)

# Modify it however you want, whether through plain SQL
df = df.execute_fetch('select "whatever", "i", "want" from "wherever_i_want" where "what_i_need" is not null ')

# Or through any number of built-in methods like filtering
df = df[df['create_date'] >= '2023-01-01']

# Or creating new columns
df['new_date'] = datetime.now()

# Or modifying existing ones
df['salary'] = df['salary'] * 2

# Or applying functions
df['user_id'] = df['user_id'].apply(lambda x: x**2)

# Or deduplicating
df = df.deduplicate(subset=['user_id','user_name'])

# Or iterate through it row-by-row and modify it
for idx, row in df.iter_tuples(index=True):
    if row['number'] % 2 == 0:
        row[idx,'odd_even'] = 'even'
    else:
        row[idx,'odd_even'] = 'odd'

# Or join it using any of the standard join operations
df = df_left.merge(df_right, how='left', left_on='id', right_on='id')

# Or group or aggregate the data:
df_agg = df.group_by(["first", "last", "position"])

# Or have your data imported and described for you
df = sdm.from_parquet('titanic.parquet').describe()

# View result
print(df)

This will output:

┌────────┬─────────────┬──────────┬────────┬────────┬───────┬────────┐
│ metric │ passengerid │ survived │ pclass │    sex │   age │   fare │
├────────┼─────────────┼──────────┼────────┼────────┼───────┼────────┤
│ count  │         891 │      891 │    891 │    891 │   714 │    891 │
│ unique │         891 │        2 │      3 │      2 │    88 │    248 │
│ top    │         891 │        0 │      3 │   male │    24 │   8.05 │
│ freq   │           1 │      549 │    491 │    577 │    30 │     43 │
│ mean   │         446 │        0 │      2 │    NaN │  29.7 │   32.2 │
│ std    │         257 │        0 │      0 │    NaN │ 14.53 │  49.69 │
│ min    │           1 │        0 │      1 │ female │  0.42 │      0 │
│ p25    │         223 │        0 │      2 │    NaN │     6 │    7.9 │
│ p50    │         446 │        0 │      3 │    NaN │    24 │  14.45 │
│ p75    │         669 │        1 │      3 │    NaN │    35 │     31 │
│ max    │         891 │        1 │      3 │   male │    80 │ 512.33 │
│ dtype  │         int │      int │    int │    str │ float │  float │
└────────┴─────────────┴──────────┴────────┴────────┴───────┴────────┘
[12 rows x 7 columns]

Move data quickly from one source or format to another:

# Load it to your destination database:
df.to_sql("new_table", destination_db_conn)

# Or any number of formats including:
df.to_csv("output.csv")
df.to_html("output.html")
df.to_json("output.json")
df.to_latex("output.tex")
df.to_markdown("output.md")
df.to_parquet("output.parquet")
df.to_pickle("output.sdm")
df.to_text("output.txt")
df.to_xml("output.xml")
df.to_local_db("output.db")

# Reload it back again from more formats:
df = sdm.from_csv("output.csv")
df = sdm.from_dict(py_dict)
df = sdm.from_html("output.html")
df = sdm.from_json("output.json")
df = sdm.from_latex("output.tex")
df = sdm.from_markdown("output.md")
df = sdm.from_numpy(np_arr)
df = sdm.from_pandas(pd_df)
df = sdm.from_polars(pl_df)
df = sdm.from_parquet("output.parquet")
df = sdm.from_pickle("output.sdm")
df = sdm.from_sql("output", sqlite3.connect('output.db'))
df = sdm.from_xml("output.xml")

Data Formats

SQLDataModel seamlessly interacts with a wide range of data formats providing a versatile platform for data extraction, conversion, and writing. Supported formats include:

  • Arrow: Convert to and from Apache Arrow format, pyarrow required.

  • CSV: Extract from and write to comma separated value, .csv, files.

  • Excel: Extract from and write to Excel .xlsx files, openpyxl required.

  • HTML: Extract from web and write to and from .html files including formatted string literals.

  • JSON: Extract from and write to .json files, JSON-like objects, or JSON formatted sring literals.

  • LaTeX: Extract from and write to .tex files, LaTeX formatted string literals.

  • Markdown: Extract from and write to .MD files, Markdown formatted string literals.

  • Numpy: Convert to and from numpy.ndarray objects, numpy required.

  • Pandas: Convert to and from pandas.DataFrame objects, pandas required.

  • Parquet: Extract from and write to .parquet files, pyarrow required.

  • Pickle: Extract from and write to .pkl files, package uses .sdm extension when pickling for SQLDataModel metadata.

  • Polars: Convert to and from polars.DataFrame objects, polars required.

  • SQL: Extract from and write to the following popular SQL databases:

    • SQLite: Using the built-in sqlite3 module.

    • PostgreSQL: Using the psycopg2 package.

    • SQL Server: Using the pyodbc package.

    • Oracle: Using the cx_Oracle package.

    • Teradata: Using the teradatasql package.

  • Text: Write to and from .txt files including other SQLDataModel string representations.

  • TSV or delimited: Write to and from files delimited by:

    • \t: Tab separated values or .tsv files.

    • \s: Single space or whitespace separated values.

    • ;: Semicolon separated values.

    • |: Pipe separated values.

    • :: Colon separated values.

    • ,: Comma separated values or .csv files.

  • XML: Extract from xml formats and write to and from .xml files including XML formatted string literals.

  • Python objects:

    • dictionaries: Convert to and from collections of python dict objects.

    • lists: Convert to and from collections of python list objects.

    • tuples: Convert to and from collections of python tuple objects.

    • namedtuples: Convert to and from collections of namedtuples objects.

Pretty Printing

SQLDataModel also pretty prints your table in any color you specify, use SQLDataModel.set_display_color() and provide either a hex value or a tuple of rgb and print the table, example output:

┌───┬─────────────────────┬────────────┬─────────────┬────────┬─────────┐
│   │ full_name           │ date       │ country     │    pin │ service │
├───┼─────────────────────┼────────────┼─────────────┼────────┼─────────┤
│ 0 │ Pamela Berg         │ 2024-09-15 │ New Zealand │   3010 │    3.02 │
│ 1 │ Mason Hoover        │ 2024-01-23 │ Australia   │   6816 │    5.01 │
│ 2 │ Veda Suarez         │ 2023-09-04 │ Ukraine     │   1175 │    4.65 │
│ 3 │ Guinevere Cleveland │ 2024-03-22 │ New Zealand │   4962 │    3.81 │
│ 4 │ Vincent Mccoy       │ 2023-09-16 │ France      │   4446 │    2.95 │
│ 5 │ Holmes Kemp         │ 2024-11-13 │ Germany     │   9396 │    4.61 │
│ 6 │ Donna Mays          │ 2023-06-06 │ Costa Rica  │   8153 │    5.34 │
│ 7 │ Rama Galloway       │ 2023-09-22 │ Italy       │   3384 │    3.87 │
│ 8 │ Lucas Rodriquez     │ 2024-03-16 │ New Zealand │   3278 │    2.73 │
│ 9 │ Hunter Donaldson    │ 2023-06-30 │ Belgium     │   1593 │    4.58 │
└───┴─────────────────────┴────────────┴─────────────┴────────┴─────────┘

Note

  • No additional dependencies are installed with this package, however you will obviously need to have pandas or numpy to create pandas or numpy objects.

  • Use SQLDataModel.set_display_color() to modify the terminal color of the table, by default no color styling is applied.

  • Use SQLDataModel.get_supported_sql_connections() to view supported SQL connection packages, please reach out with any issues or questions, thanks!

__add__(value: str | int | float | SQLDataModel) SQLDataModel[source]

Implements the + operator functionality for compatible SQLDataModel operations.

Parameters:

value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.

Raises:
  • TypeError – If the provided value is not a valid type (str, int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as addition) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the addition operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar addition
df['x + 100'] = df['x'] + 100

# Perform vector addition using another column
df['x + y'] = df['x'] + df['y']

# View both results
print(df)

This will output:

┌─────┬─────┬─────────┬───────┐
│   x │   y │ x + 100 │ x + y │
├─────┼─────┼─────────┼───────┤
│   2 │  10 │     102 │    12 │
│   4 │  20 │     104 │    24 │
│   8 │  30 │     108 │    38 │
│  16 │  40 │     116 │    56 │
│  32 │  50 │     132 │    82 │
└─────┴─────┴─────────┴───────┘
[5 rows x 4 columns]

We can also use addition to concatenate strings:

import sqldatamodel as sdm

# Sample data
headers = ['First', 'Last']
data = [['Alice', 'Smith'],['Bob', 'Johnson'],['Charlie', 'Hall'],['David', 'Brown']]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Concatenate scalar character
df['Loud First'] = df['First'] + '!'

# Concatenate scalar and vector using existing columns
df['Full Name'] = df['First'] + ' ' + df['Last']

# View it
print(df)

This will output:

┌─────────┬─────────┬────────────┬──────────────┐
│ First   │ Last    │ Loud First │ Full Name    │
├─────────┼─────────┼────────────┼──────────────┤
│ Alice   │ Smith   │ Alice!     │ Alice Smith  │
│ Bob     │ Johnson │ Bob!       │ Bob Johnson  │
│ Charlie │ Hall    │ Charlie!   │ Charlie Hall │
│ David   │ Brown   │ David!     │ David Brown  │
└─────────┴─────────┴────────────┴──────────────┘
[4 rows x 4 columns]

Note

  • Mixing summands such as int + float will work, however an exception will be raised when attempting to perform addition on incompatible types such as str + float.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__and__(other: SQLDataModel) set[int][source]

Implements the bitwise AND operator & for combining the result sets of self and other.

Parameters:

other – The SQLDataModel to combine with.

Returns:

A set of indices representing the intersection of the result rows from both SQLDataModel instances.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the sample model
df = sdm.SQLDataModel(data, headers)

# Apply some filtering conditions to both models
filter_1 = df[df['Age'] <= 40]
filter_2 = df[df['Service'] > 2]

# Perform a bitwise AND operation to return a new model
result = df[filter_1 & filter_2]

# View result
print(result)

This will output the result of filtering by ‘Age’ and ‘Service’:

┌───┬───────┬────────┬─────┬─────────┬────────────┬────────┐
│   │ First │ Last   │ Age │ Service │ Hired      │ Gender │
├───┼───────┼────────┼─────┼─────────┼────────────┼────────┤
│ 1 │ Kelly │ Lee    │  32 │    8.00 │ 2016-09-18 │ Female │
│ 2 │ Mike  │ Harlin │  36 │    3.90 │ 2020-08-27 │ Male   │
└───┴───────┴────────┴─────┴─────────┴────────────┴────────┘
[2 rows x 6 columns]

Note

  • If other is not an instance of SQLDataModel, a NotImplementedError is raised to be consistent with current conventions.

  • See SQLDataModel.__or__() for bitwise OR operation.

Changelog:
  • Version 0.7.4 (2024-06-13):
    • New method.

__bool__() bool[source]

Implements logical boolean operator for SQLDataModel using the current row count.

Returns:

True if SQLDataModel.row_count != 0, False otherwise.

Return type:

bool

Example:

import sqldatamodel as sdm

# Create an empty model
df = sdm.SQLDataModel(headers=['Stage', 'Match', 'Result'])

# Use boolean method to avoid duplicating result
if not df:
    df[0] = ['Group', 1, 'Scotland Win']
else:
    print('Match result already stored')

Note

  • This method is equivalent to df.row_count != 0

  • See SQLDataModel.__eq__() and related comparison methods for more details.

Changelog:
  • Version 0.7.1 (2024-06-09):
    • New method.

__eq__(other) set[int][source]

Implements the is equal to operator == for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

The set of row indicies resulting from the operation that satisfy the condition.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter by 'Gender' column
df = df[df['Gender'] == 'Female']

# View result
print(df)

This will output:

┌───┬───────┬──────┬──────┬─────────┬────────────┬────────┐
│   │ First │ Last │  Age │ Service │ Hired      │ Gender │
├───┼───────┼──────┼──────┼─────────┼────────────┼────────┤
│ 0 │ Kelly │ Lee  │   32 │    8.00 │ 2016-09-18 │ Female │
│ 1 │ Sarah │ West │   51 │    0.70 │ 2023-10-01 │ Female │
└───┴───────┴──────┴──────┴─────────┴────────────┴────────┘
[2 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__floordiv__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the // operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the floor division operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar floor division
df['y // 10'] = df['y'] // 10

# Perform vector floor division using another column
df['y // x'] = df['y'] // df['x']

# View both results
print(df)

This will output:

┌─────┬─────┬─────────┬────────┐
│   x │   y │ y // 10 │ y // x │
├─────┼─────┼─────────┼────────┤
│   2 │  10 │       1 │      5 │
│   4 │  20 │       2 │      5 │
│   8 │  30 │       3 │      3 │
│  16 │  40 │       4 │      2 │
│  32 │  50 │       5 │      1 │
└─────┴─────┴─────────┴────────┘
[5 rows x 4 columns]

Note

  • Mixing divisor types such as int // float will work, however an exception will be raised when attempting to perform division on incompatible types such as str // float.

Changelog:
  • Version 0.2.2 (2024-03-26):
    • New method.

__ge__(other) set[int][source]

Implements the greater than or equal to operator >= for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

The set of row indicies resulting from the operation that satisfy the condition.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter by 'Hired' column
df = df[df['Hired'] >= datetime.date(2020,1,1)]

# View result
print(df)

This will output:

┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐
│   │ First │ Last   │  Age │ Service │ Hired      │ Gender │
├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith  │   27 │    1.22 │ 2023-02-01 │ Male   │
│ 1 │ Mike  │ Harlin │   36 │    3.90 │ 2020-08-27 │ Male   │
│ 2 │ Sarah │ West   │   51 │    0.70 │ 2023-10-01 │ Female │
└───┴───────┴────────┴──────┴─────────┴────────────┴────────┘
[3 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which result from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__getitem__(target_indicies) SQLDataModel[source]

Retrieves a subset of the SQLDataModel based on the specified indices.

Parameters:

slc – Indices specifying the rows and columns to be retrieved. This can be an integer, a tuple, a slice, or a combination of these.

Raises:
  • ValueError – if there are issues with the specified indices, such as invalid row or column names.

  • TypeError – if the slc type is not compatible with indexing SQLDataModel.

  • IndexError – if the slc includes a range or int that is outside of the current row count or column count.

Returns:

An instance of SQLDataModel containing the selected subset of data.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create a sample model
headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the sample model
df = sdm.SQLDataModel(data, headers)

# Retrieve a specific row by index
subset_model = df[3]

# Retrieve multiple rows and specific columns using a tuple
subset_model = df[(1, 2, 4), ["First", "Service", "Age"]]

# Retrieve a range of rows and all columns using a slice
subset_model = df[1:4]

# Retrieve a single column by name
subset_model = df["First"]
Changelog:
  • Version 0.5.0 (2024-05-09):
    • Modified index retention behavior to pass through row indicies and avoid resetting view order.

Note

  • The slc parameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.

  • The returned SQLDataModel instance will contain the specified subset of rows and columns, retaining the row indicies of the original view.

Changelog:
  • Version 0.5.0 (2024-05-09):
    • Modified index retention behavior to pass through row indicies and avoid resetting view order.

__gt__(other) set[int][source]

Implements the greater than operator > for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

The set of row indicies resulting from the operation that satisfy the condition.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter by 'Service' column
df = df[df['Service'] > 5.0]

# View result
print(df)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐
│   │ First │ Last    │  Age │ Service │ Hired      │ Gender │
├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │ Female │
│ 1 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │ Male   │
└───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘
[2 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__iadd__(value: str | int | float | SQLDataModel) SQLDataModel[source]

Implements the += operator functionality for compatible SQLDataModel operations.

Parameters:

value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.

Raises:

TypeError – If the provided value is not a valid type (str, int, float, or SQLDataModel).

Returns:

The modified SQLDataModel after the addition operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['idx', 'first', 'last', 'age', 'service']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Modifying first name column with a bang!
df['first'] += '!'

# View model
print(df)

This will output:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john!  │ smith   │     27 │    1.22 │
│ 1 │ sarah! │ west    │     39 │    0.70 │
│ 2 │ mike!  │ harlin  │     36 │    3.00 │
│ 3 │ pat!   │ douglas │     42 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]
Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__idiv__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the /= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • ZeroDivisionError – If value of divisor is 0.

Returns:

The modified SQLDataModel after the division operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Budget'])

# Adjust existing column
df['Budget'] /= 52
Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__ifloordiv__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the //= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int or float).

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the floor division operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x']
data = [[10],[20],[30],[40],[50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Modify the existing column
df['x'] //= 3

# View result
print(df)

This will output:

┌───┬──────┐
│   │    x │
├───┼──────┤
│ 0 │    3 │
│ 1 │    6 │
│ 2 │   10 │
│ 3 │   13 │
│ 4 │   16 │
└───┴──────┘
[5 rows x 1 columns]
Changelog:
  • Version 0.2.2 (2024-03-26):
    • New method.

__imul__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the *= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.

Raises:

TypeError – If the provided value is not a valid type (int or float).

Returns:

The modified SQLDataModel after the multiplication operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Salary'])

# Give raises to all!
df['Salary'] *= 12
Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__init__(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = 'default')[source]

Initializes a new instance of SQLDataModel.

Parameters:
  • data (list[list]) – The data to populate the model. Should be a list of lists or a list of tuples or a dictionary orientated by rows or columns.

  • headers (list[str], optional) – The column headers for the model. If not provided, default headers will be used.

  • dtypes (dict, optional) – A dictionary specifying the data types for each column. Format: {‘column’: ‘dtype’}.

  • display_max_rows (int, optional) – The maximum number of rows to display. Default is None, using terminal height to format number of rows.

  • min_column_width (int, optional) – The minimum width for each column. Default is 3.

  • max_column_width (int, optional) – The maximum width for each column. Default is 38.

  • column_alignment (str, optional) – The alignment for columns, must be ‘dynamic’, ‘left’, ‘center’ or ‘right’). Default is ‘dynamic’.

  • display_color (str|tuple, optional) – The color for display as hex code string or rgb tuple.

  • display_index (bool, optional) – Whether to display row indices. Default is True.

  • display_float_precision (int, optional) – The number of decimal places to display for float values. Default is 2.

  • infer_types (bool, optional) – Whether to infer the data types based on a randomly selected sample. Default is False, using first row to derive the corresponding type directly.

  • table_style (str, optional) – The styling to use when representing the table in textual formats. Must be ‘ascii’, ‘bare’, ‘dash’, ‘default’, ‘double’, ‘list’, ‘markdown’, ‘outline’, ‘pandas’, ‘polars’, ‘postgresql’, ‘rst-grid’, ‘rst-simple’ or ‘round’.

Raises:
  • ValueError – If data and headers are not provided, or if data is of insufficient length.

  • TypeError – If data or headers is not a valid type (list or tuple), or if dtypes is not a dictionary.

  • DimensionError – If the length of headers does not match the implied column count from the data.

  • SQLProgrammingError – If there’s an issue with executing SQL statements during initialization.

Example:

import sqldatamodel as sdm

# Create sample data
data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')]

# Create the model with custom headers
df = sdm.SQLDataModel(data, headers=['Name','Age','Sex'])

# Display the model
print(df)

This will output the SQLDataModel formatted to fit within the current terminal:

┌────────┬──────┬──────┐
│ Name   │  Age │ Sex  │
├────────┼──────┼──────┤
│ Alice  │   20 │ F    │
│ Bob    │   25 │ M    │
│ Gerald │   30 │ M    │
└────────┴──────┴──────┘
[3 rows x 3 columns]

A SQLDataModel can be initialized from dozens of data formats, including python dictionaries:

import sqldatamodel as sdm

# Dictionary with sample data
data = {
    'Name': ['Ali', 'Bob', 'Chris'],
    'Role': ['Judge', 'Pilot', 'Nurse'],
    'Height': [174.2, 180.9, 173.4],
}

# Create the model and set a new style
df = sdm.SQLDataModel(data, table_style='list')

# View it
print(df)

This will output the SQLDataModel using the ‘list’ styling:

Name   Role    Height
-----  -----  -------
Ali    Judge   174.20
Bob    Pilot   180.90
Chris  Nurse   173.40

Note

  • If data is not provided, an empty model is created with headers, at least one of data, headers or dtypes are required to instantiate the model.

  • If headers are not provided, default headers will be generated using the the format '0', '1', ..., N where N is the column count.

  • If dtypes is provided, it must be a dictionary with column names as keys and Python data types as string values, e.g., {‘first_name’: ‘str’, ‘weight’: ‘float’}

  • If infer_types = True and dtypes are provided, the order will be resolved by first inferring the types, then overriding the inferred types for each {col:type} provided in the dtypes argument. If one is not provided, then the inferred type will be used as a fallback.

  • For creating SQLDataModel from file formats like CSV, Markdown, LaTeX, Excel, Parquet or Text files, see SQLDataModel.from_data() or go to format specific constructor.

  • For creating SQLDataModel from object formats like Pyarrow, JSON, HTML, Pandas, Numpy or Polars, see format specific constructor like SQLDataModel.from_pandas() or SQLDataModel.from_numpy().

  • Use SQLDataModel.set_table_style() to change the format and styling used when displaying the model.

  • Use SQLDataModel.set_display_index() to toggle inclusion of index column in table representations.

  • Use SQLDataModel.set_display_color() to modify the terminal color used to style the model.

  • Use SQLDataModel.set_display_max_rows() to modify the number of rows output in the representations.

Changelog:
  • Version 2.3.0 (2026-01-21):
    • Modified to handle decimal.Decimal type by lossy conversion to python’s float type

  • Version 0.12.0 (2024-07-06):
  • Version 0.11.0 (2024-07-05):
    • Added additional option ‘latex’ for table_style parameter.

  • Version 0.9.3 (2024-06-28):
    • Added additional options ‘rst-simple’ and ‘rst-grid’ for table_style parameter.

__ipow__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the **= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to raise each element in the SQLDataModel to.

Raises:

TypeError – If the provided value is not a valid type (int or float).

Returns:

The modified SQLDataModel after the exponential operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Salary'])

# More raises!
df['Salary'] **= 2
Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__isub__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the -= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.

Raises:

TypeError – If the provided value is not a valid type (int, float, or SQLDataModel).

Returns:

The modified SQLDataModel after the subtraction operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age', 'service']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Modifying age column in the best direction
df['age'] -= 10

# View model
print(df)

This will output:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john   │ smith   │     17 │    1.22 │
│ 1 │ sarah  │ west    │     29 │    0.70 │
│ 2 │ mike   │ harlin  │     26 │    3.00 │
│ 3 │ pat    │ douglas │     32 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]
Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__iter__() Iterator[tuple][source]

Returns an iterator over the current range of rows in the SQLDataModel starting from the first row.

Raises:

StopIteration – When there are no more rows to return.

Yields:

tuple – Next row fetched from the current SQLDataModel.

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Iterate through rows
for row in df:
    print(row)

This will output:

(0, 'John', 30, 175.3)
(1, 'Alice', 28, 162.0)
(2, 'Travis', 35, 185.8)

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__le__(other) set[int][source]

Implements the less than or equal to operator <= for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

The set of row indicies resulting from the operation that satisfy the condition.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter by 'Age' column
df = df[df['Age'] <= 40]

# View result
print(df)

This will output:

┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐
│   │ First │ Last   │  Age │ Service │ Hired      │ Gender │
├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith  │   27 │    1.22 │ 2023-02-01 │ Male   │
│ 1 │ Kelly │ Lee    │   32 │    8.00 │ 2016-09-18 │ Female │
│ 2 │ Mike  │ Harlin │   36 │    3.90 │ 2020-08-27 │ Male   │
└───┴───────┴────────┴──────┴─────────┴────────────┴────────┘
[3 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__len__() int[source]

Returns the SQLDataModel.row_count property for the current SQLDataModel which represents the current number of rows in the model.

Returns:

The total number of rows in the SQLDataModel.

Return type:

int

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get current length
num_rows = len(df)

# View number
print(num_rows)

This will output:

1000
Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__lt__(other) set[int][source]

Implements the less than operator < for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

The set of row indicies resulting from the operation that satisfy the condition.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter by 'Age' column
df = df[df['Age'] < 40]

# View result
print(df)

This will output:

┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐
│   │ First │ Last   │  Age │ Service │ Hired      │ Gender │
├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith  │   27 │    1.22 │ 2023-02-01 │ Male   │
│ 1 │ Kelly │ Lee    │   32 │    8.00 │ 2016-09-18 │ Female │
│ 2 │ Mike  │ Harlin │   36 │    3.90 │ 2020-08-27 │ Male   │
└───┴───────┴────────┴──────┴─────────┴────────────┴────────┘
[3 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__mul__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the * operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as multiplication) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the multiplication operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdmSQLDataModel(data, headers)

# Perform scalar multiplication
df['x * 10'] = df['x'] * 10

# Perform vector multiplication using another column
df['x * y'] = df['x'] * df['y']

# View results
print(df)

This will output:

┌─────┬─────┬────────┬───────┐
│   x │   y │ x * 10 │ x * y │
├─────┼─────┼────────┼───────┤
│   2 │  10 │     20 │    20 │
│   4 │  20 │     40 │    80 │
│   8 │  30 │     80 │   240 │
│  16 │  40 │    160 │   640 │
│  32 │  50 │    320 │  1600 │
└─────┴─────┴────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing multipliers such as int * float will work, however an exception will be raised when attempting to perform multiplication on incompatible types such as str * float.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__ne__(other) set[int][source]

Implements the not equal to operator != for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

The set of row indicies resulting from the operation that satisfy the condition.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter by 'First' column
df = df[df['First'] != 'John']

# View result
print(df)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐
│   │ First │ Last    │  Age │ Service │ Hired      │ Gender │
├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │ Female │
│ 1 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │ Male   │
│ 2 │ Sarah │ West    │   51 │    0.70 │ 2023-10-01 │ Female │
│ 3 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │ Male   │
└───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘
[4 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__or__(other: SQLDataModel) set[int][source]

Implements the bitwise OR operator | for combining the result sets of self and other.

Parameters:

other – The SQLDataModel to combine with.

Returns:

A set of indices representing the union of the result rows from both SQLDataModel instances.

Return type:

set[int]

Example:

import sqldatamodel as sdm

headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the sample model
df = sdm.SQLDataModel(data, headers)

# Apply some filtering conditions to both models
filter_1 = df[df['Age'] > 40]
filter_2 = df[df['Gender'] == 'Male']

# Perform a bitwise OR operation to return a new model
result = df[filter_1 | filter_2]

# View result
print(result)

This will output the result of filtering by ‘Age’ or ‘Gender’:

┌───┬───────┬─────────┬─────┬─────────┬────────────┬────────┐
│   │ First │ Last    │ Age │ Service │ Hired      │ Gender │
├───┼───────┼─────────┼─────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith   │  27 │    1.22 │ 2023-02-01 │ Male   │
│ 2 │ Mike  │ Harlin  │  36 │    3.90 │ 2020-08-27 │ Male   │
│ 3 │ Sarah │ West    │  51 │    0.70 │ 2023-10-01 │ Female │
│ 4 │ Pat   │ Douglas │  42 │   11.50 │ 2015-11-06 │ Male   │
└───┴───────┴─────────┴─────┴─────────┴────────────┴────────┘
[4 rows x 6 columns]

Note

  • If other is not an instance of SQLDataModel, a NotImplementedError is raised to be consistent with current conventions.

  • See SQLDataModel.__and__() for bitwise AND operation.

Changelog:
  • Version 0.7.4 (2024-06-13):
    • New method.

__pow__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the ** operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The exponent value to raise each element in the SQLDataModel to.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as exponentiation) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the exponential operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,1], [4,2], [8,3], [16,4], [32,5]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar exponentiation
df['y ** 2'] = df['y'] ** 2

# Perform vector exponentiation using another column
df['x ** y'] = df['x'] ** df['y']

# View results
print(df)

This will output:

┌─────┬─────┬────────┬──────────┐
│   x │   y │ y ** 2 │   x ** y │
├─────┼─────┼────────┼──────────┤
│   2 │   1 │      1 │        2 │
│   4 │   2 │      4 │       16 │
│   8 │   3 │      9 │      512 │
│  16 │   4 │     16 │    65536 │
│  32 │   5 │     25 │ 33554432 │
└─────┴─────┴────────┴──────────┘
[5 rows x 4 columns]

Note

  • Mixing exponent types such as int ** float will work, however an exception will be raised when attempting to exponentiate incompatible types such as str ** float.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__radd__(value: str | int | float | SQLDataModel) SQLDataModel[source]

Implements the right side operand for + operator functionality for compatible SQLDataModel operations.

Parameters:

value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.

Raises:
  • TypeError – If the provided value is not a valid type (str, int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as addition) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the addition operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar addition
df['100 + x'] = 100 + df['x']

# Perform vector addition using another column
df['y + x'] = df['y'] + df['x']

# View both results
print(df)

This will output:

┌─────┬─────┬─────────┬───────┐
│   x │   y │ 100 + x │ y + x │
├─────┼─────┼─────────┼───────┤
│   2 │  10 │     102 │    12 │
│   4 │  20 │     104 │    24 │
│   8 │  30 │     108 │    38 │
│  16 │  40 │     116 │    56 │
│  32 │  50 │     132 │    82 │
└─────┴─────┴─────────┴───────┘
[5 rows x 4 columns]

We can also use addition to concatenate strings:

import sqldatamodel as sdm

# Sample data
headers = ['First', 'Last']
data = [['Alice', 'Smith'],['Bob', 'Johnson'],['Charlie', 'Hall'],['David', 'Brown']]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Concatenate scalar character
df['Prefixed First'] = 'Name: ' + df['First']

# Concatenate scalar and vector using existing columns
df['Full Name'] = df['First'] + ' ' + df['Last']

# View it
print(df)

This will output:

┌─────────┬─────────┬────────────────┬──────────────┐
│ First   │ Last    │ Prefixed First │ Full Name    │
├─────────┼─────────┼────────────────┼──────────────┤
│ Alice   │ Smith   │ Name: Alice    │ Alice Smith  │
│ Bob     │ Johnson │ Name: Bob      │ Bob Johnson  │
│ Charlie │ Hall    │ Name: Charlie  │ Charlie Hall │
│ David   │ Brown   │ Name: David    │ David Brown  │
└─────────┴─────────┴────────────────┴──────────────┘
[4 rows x 4 columns]

Note

  • Mixing summands such as int + float will work, however an exception will be raised when attempting to perform addition on incompatible types such as str + float.

  • See SQLDataModel.__add__() for left side operand addition or SQLDataModel.__iadd__() for in-place addition.

Changelog:
  • Version 0.7.3 (2024-06-12):
    • New method.

__repr__() str[source]

Returns a pretty printed string representation of SQLDataModel formatted to the current terminal size.

Returns:

The string representation of the SQLDataModel instance output using display and format values set on instance.

Return type:

str

Example:

import sqldatamodel as sdm

# Sample data
headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the model
df = sdm.SQLDataModel(data,headers)

# Display the string representation
print(df)

This will output the default alignment, dynamically aligning columns based on their dtype, right-aligned for numeric, left otherwise:

┌───┬────────┬─────────┬────────┐
│   │ first  │ last    │    age │
├───┼────────┼─────────┼────────┤
│ 0 │ john   │ smith   │     27 │
│ 1 │ sarah  │ west    │     29 │
│ 2 │ mike   │ harlin  │     36 │
│ 3 │ pat    │ douglas │     42 │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Using 'left' column alignment:

# Using left alignment instead
df.set_column_alignment("left")

# See difference
print(df)

This will output:

┌───┬────────┬─────────┬────────┐
│   │ first  │ last    │ age    │
├───┼────────┼─────────┼────────┤
│ 0 │ john   │ smith   │ 27     │
│ 1 │ sarah  │ west    │ 29     │
│ 2 │ mike   │ harlin  │ 36     │
│ 3 │ pat    │ douglas │ 42     │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Using 'center' column alignment:

# Using center alignment instead
df.set_column_alignment("center")

# See difference
print(df)

This will output:

┌───┬────────┬─────────┬────────┐
│   │ first  │  last   │  age   │
├───┼────────┼─────────┼────────┤
│ 0 │  john  │  smith  │   27   │
│ 1 │ sarah  │  west   │   29   │
│ 2 │  mike  │ harlin  │   36   │
│ 3 │  pat   │ douglas │   42   │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Using 'right' column alignment:

# Using right alignment instead
df.set_column_alignment("right")

# See difference
print(df)

This will output:

┌───┬────────┬─────────┬────────┐
│   │  first │    last │    age │
├───┼────────┼─────────┼────────┤
│ 0 │   john │   smith │     27 │
│ 1 │  sarah │    west │     29 │
│ 2 │   mike │  harlin │     36 │
│ 3 │    pat │ douglas │     42 │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Note

Changelog:
  • Version 0.12.0 (2024-07-06):
    • Changed default behavior to display a minimum of 4 rows when display_max_rows = None to retain data visibility when terminal size is below threshold.

  • Version 0.10.4 (2024-07-03):
    • Modified to escape newline characters through utils.sqlite_printf_format() to avoid wrapping table rows.

  • Version 0.7.0 (2024-06-08):
    • Modified horizontal truncation behavior to alternate column selection between table start and table end instead of sequential left to right ordering.

__rfloordiv__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the right side operand // operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the floor division operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,8], [4,16], [8,32], [32,64], [32,128]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar floor division
df['128 // y'] = 128 // df['y']

# Perform vector floor division using another column
df['y // x'] = df['y'] // df['x']

# View both results
print(df)

This will output:

┌─────┬─────┬──────────┬────────┐
│   x │   y │ 128 // y │ y // x │
├─────┼─────┼──────────┼────────┤
│   2 │   8 │       16 │      4 │
│   4 │  16 │        8 │      4 │
│   8 │  32 │        4 │      4 │
│  32 │  64 │        2 │      2 │
│  32 │ 128 │        1 │      4 │
└─────┴─────┴──────────┴────────┘
[5 rows x 4 columns]

Note

  • Mixing divisor types such as int // float will work, however an exception will be raised when attempting to perform division on incompatible types such as str // float.

  • See SQLDataModel.__floordiv__() for standard left side operand implementation of floor division operations.

Changelog:
  • Version 0.7.7 (2024-06-17):
    • New method.

__rmul__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the right side operand for * operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as multiplication) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the multiplication operation.

Return type:

SQLDataModel

Note

  • See SQLDataModel.__mul__() for additional details and usage examples.

  • This function simply wraps the primary multiplication method after swapping the order of the arguments.

Changelog:
  • Version 0.7.3 (2024-06-12):
    • New method.

__rpow__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the right side operand ** operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The exponent value to raise each element in the SQLDataModel to.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as exponentiation) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the exponential operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,1], [4,2], [6,3], [8,4], [10,5]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar exponentiation
df['2 ** y'] = 2 ** df['y']

# Perform vector exponentiation using another column
df['y ** x'] = df['y'] ** df['x']

# View results
print(df)

This will output:

┌─────┬─────┬────────┬─────────┐
│   x │   y │ 2 ** y │  y ** x │
├─────┼─────┼────────┼─────────┤
│   2 │   1 │      2 │       1 │
│   4 │   2 │      4 │      16 │
│   6 │   3 │      8 │     729 │
│   8 │   4 │     16 │   65536 │
│  10 │   5 │     32 │ 9765625 │
└─────┴─────┴────────┴─────────┘
[5 rows x 4 columns]

Note

  • Mixing exponent types such as int ** float will work, however an exception will be raised when attempting to exponentiate incompatible types such as str ** float.

  • See SQLDataModel.__pow__() for standard left side operand implementation of exponential operations.

Changelog:
  • Version 0.7.7 (2024-06-17):
    • New method.

__rsub__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the right side operand for - operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.

Raises:
  • TypeError – If the provided value is not a valid type (int or float).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as subtraction) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the subtraction operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar subtraction
df['100 - x'] = 100 - df['x']

# Perform vector subtraction using another column
df['y - x'] = df['y'] - df['x']

# View both results
print(df)

This will output:

┌─────┬─────┬─────────┬───────┐
│   x │   y │ 100 - x │ y - x │
├─────┼─────┼─────────┼───────┤
│   2 │  10 │      98 │     8 │
│   4 │  20 │      96 │    16 │
│   8 │  30 │      92 │    22 │
│  16 │  40 │      84 │    24 │
│  32 │  50 │      68 │    18 │
└─────┴─────┴─────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing subtractors such as int + float will work, however an exception will be raised when attempting to perform subtraction on incompatible types such as str - float.

  • See SQLDataModel.__sub__() for left side operand subtraction or SQLDataModel.__isub__() for in-place subtraction.

Changelog:
  • Version 0.7.3 (2024-06-12):
    • New method.

__rtruediv__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the right side operand / operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the division operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar division
df['10 / y'] = 10 / df['y']

# Perform vector division using another column
df['x / y'] = df['x'] / df['y']

# View both results
print(df)

This will output:

┌─────┬─────┬────────┬───────┐
│   x │   y │ 10 / y │ x / y │
├─────┼─────┼────────┼───────┤
│   2 │  10 │   1.00 │  0.20 │
│   4 │  20 │   0.50 │  0.20 │
│   8 │  30 │   0.33 │  0.27 │
│  16 │  40 │   0.25 │  0.40 │
│  32 │  50 │   0.20 │  0.64 │
└─────┴─────┴────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing divisor types such as int / float will work, however an exception will be raised when attempting to perform division on incompatible types such as str / float.

  • See SQLDataModel.__truediv__() for left side operand division operations.

Changelog:
  • Version 0.7.3 (2024-06-12):
    • New method.

__setitem__(target_indicies, update_values) None[source]

Updates specified rows and columns in the SQLDataModel with the provided values.

Parameters:
  • target_indicies – Indices specifying the rows and columns to be updated. This can be an integer, a tuple, a slice, or a combination of these.

  • update_values – The values to be assigned to the corresponding model records. It can be of types: str, int, float, bool, bytes, list, tuple, or another SQLDataModel object.

Raises:
  • TypeError – If the update_values type is not compatible with SQL datatypes.

  • DimensionError – If there is a shape mismatch between targeted indicies and provided update values.

  • ValueError – If there are issues with the specified indices, such as invalid row or column names.

Returns:

None

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Job']
data = [
    ('Billy', 30, 'Barber'),
    ('Alice', 28, 'Doctor'),
    ('John', 25, 'Technician'),
    ('Travis', 35, 'Musician'),
    ('William', 15, 'Student')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Update a specific row with new values
df[2] = ("John", 25, "Engineer")

# See result
print(df)

This will output:

┌───┬─────────┬──────┬──────────┐
│   │ Name    │  Age │ Job      │
├───┼─────────┼──────┼──────────┤
│ 0 │ Billy   │   30 │ Barber   │
│ 1 │ Alice   │   28 │ Doctor   │
│ 2 │ John    │   25 │ Engineer │
│ 3 │ Travis  │   35 │ Musician │
│ 4 │ William │   15 │ Student  │
└───┴─────────┴──────┴──────────┘
[5 rows x 3 columns]

Conditional updates can also be made using multiple columns:

import sqldatamodel as sdm

headers = ['Employee', 'Base', 'Salary']
data = [
    ('Alice', '58,500', '62,250'),
    ('Bobby', '60,750',  None),
    ('Chloe', '58,500', '63,125'),
    ('David', '65,000',  None),
    ('Ellie', '65,000',  None),
    ('Fiona', '65,000', '71,450'),
]

# Create sample model
df = sdm.SQLDataModel(data, headers)

# Selectively update values based on conditions
df[df['Salary'].isna(), 'Salary'] = df['Base']

# View updates
print(df)

This will output the resulting model where ‘Salary’ was updated with values from ‘Base’ only if missing:

┌───┬──────────┬────────┬────────┐
│   │ Employee │ Base   │ Salary │
├───┼──────────┼────────┼────────┤
│ 0 │ Alice    │ 58,500 │ 62,250 │
│ 1 │ Bobby    │ 60,750 │ 60,750 │
│ 2 │ Chloe    │ 58,500 │ 63,125 │
│ 3 │ David    │ 65,000 │ 65,000 │
│ 4 │ Ellie    │ 65,000 │ 65,000 │
│ 5 │ Fiona    │ 65,000 │ 71,450 │
└───┴──────────┴────────┴────────┘
[6 rows x 3 columns]

Values for multiple columns can also be set:

# Update multiple rows and columns with a list of values
df[1:5, ["Name", "Age", "Job"]] = [("Alice", 30, "Manager"), ("Bob", 28, "Developer"), ("Charlie", 35, "Designer"), ("David", 32, "Analyst")]

# See result
print(df)

This will output:

┌───┬─────────┬──────┬───────────┐
│   │ Name    │  Age │ Job       │
├───┼─────────┼──────┼───────────┤
│ 0 │ Billy   │   30 │ Barber    │
│ 1 │ Alice   │   30 │ Manager   │
│ 2 │ Bob     │   28 │ Developer │
│ 3 │ Charlie │   35 │ Designer  │
│ 4 │ David   │   32 │ Analyst   │
└───┴─────────┴──────┴───────────┘
[5 rows x 3 columns]

Values can also be set along the row axes:

# Create a new column "Hobby" and set the values
df["Hobby"] = [('Fishing',), ('Biking',), ('Computers',), ('Photography',), ('Studying',)]

# See result
print(df)

This will output:

┌───┬─────────┬──────┬───────────┬─────────────┐
│   │ Name    │  Age │ Job       │ Hobby       │
├───┼─────────┼──────┼───────────┼─────────────┤
│ 0 │ Billy   │   30 │ Barber    │ Fishing     │
│ 1 │ Alice   │   30 │ Manager   │ Biking      │
│ 2 │ Bob     │   28 │ Developer │ Computers   │
│ 3 │ Charlie │   35 │ Designer  │ Photography │
│ 4 │ David   │   32 │ Analyst   │ Studying    │
└───┴─────────┴──────┴───────────┴─────────────┘
[5 rows x 4 columns]

Note

  • If update_values is another SQLDataModel object, its data will be normalized using the SQLDataModel.data() method.

  • The target_indicies parameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.

  • Values can be single values or iterables matching the specified rows and columns.

  • See SQLDataModel.apply() for setting values using a function.

Changelog:
  • Version 0.7.5 (2024-06-14):
    • Added row indicies masking to allow selective updating when update_values is also an instance of SQLDataModel using target_indicies as mask.

  • Version 0.1.9 (2024-03-19):
    • New method.

__sub__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the - operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.

Raises:
  • TypeError – If the provided value is not a valid type (int or float).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as subtraction) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the subtraction operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar subtraction
df['x - 100'] = df['x'] - 100

# Perform vector subtraction using another column
df['x - y'] = df['x'] - df['y']

# View both results
print(df)

This will output:

┌─────┬─────┬─────────┬───────┐
│   x │   y │ x - 100 │ x - y │
├─────┼─────┼─────────┼───────┤
│   2 │  10 │     -98 │    -8 │
│   4 │  20 │     -96 │   -16 │
│   8 │  30 │     -92 │   -22 │
│  16 │  40 │     -84 │   -24 │
│  32 │  50 │     -68 │   -18 │
└─────┴─────┴─────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing subtractors such as int + float will work, however an exception will be raised when attempting to perform subtraction on incompatible types such as str - float.

  • See SQLDataModel.__rsub__() for right side operand subtraction operations.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

__truediv__(value: int | float | SQLDataModel) SQLDataModel[source]

Implements the / operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the division operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Perform scalar division
df['y / 10'] = df['y'] / 10

# Perform vector division using another column
df['y / x'] = df['y'] / df['x']

# View both results
print(df)

This will output:

┌─────┬─────┬────────┬───────┐
│   x │   y │ y / 10 │ y / x │
├─────┼─────┼────────┼───────┤
│   2 │  10 │   1.00 │  5.00 │
│   4 │  20 │   2.00 │  5.00 │
│   8 │  30 │   3.00 │  3.75 │
│  16 │  40 │   4.00 │  2.50 │
│  32 │  50 │   5.00 │  1.56 │
└─────┴─────┴────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing divisor types such as int / float will work, however an exception will be raised when attempting to perform division on incompatible types such as str / float.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

_calculate_col_widths(index: bool = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, split_row: int = None, index_rep: str = None) dict[str, int][source]

Calculate the maximum column widths for each header column based on the provided conditions to assist with representation methods.

Parameters:
  • index (bool, optional) – Indicates whether to include the index column in the calculations. Default is None, using SQLDataModel.display_index value.

  • min_column_width (int, optional) – Minimum width for columns. Default is None, using SQLDataModel.minimum_column_width value.

  • max_column_width (int, optional) – Maximum width for columns. Default is None, using SQLDataModel.maximum_column_width value.

  • float_precision (int, optional) – Precision for displaying float values. Default is None, using SQLDataModel.display_float_precision value.

  • split_row (int, optional) – Row index to determine vertical truncation. If None, no vertical truncation is applied.

  • index_rep (str, optional) – String representation for the index. If None, uses a single whitespace character ' ' to represent the index column.

Returns:

Dictionary mapping each header column to its maximum calculated width as {'column': width}.

Return type:

dict[str, int]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['User', 'Key', 'Value', 'Active']
data = [
    ('Allison', 130, 237.03, True),
    ('Bobby', -400, 723.41, False),
    ('Connor', 698, 154.70, False),
    ('Dimitry', 287, 409.14, True)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Calculate the max column widths
col_widths = df._calculate_col_widths()

# View result
print(col_widths)

This will output the maximum widths for each column calculated by the column name and the corresponding values:

{'idx': 1, 'User': 7, 'Key': 4, 'Value': 7, 'Active': 6}

Note

  • When index_rep is provided, the length of the index representation will be used when calculating the maximum column width.

  • When split_row is provided, width calculation checks are restricted to only the top and bottom N number of rows specified.

  • Used by SQLDataModel.to_string() to determine appropriate column representation widths.

Changelog:
  • Version 2.3.0 (2026-01-21):
    • Modified handling of bytes/blob data to use sqlite’s quote function instead of printf to align with related methods in order to improve bytes representation

  • Version 0.11.0 (2024-07-05):
    • New method.

_generate_sql_stmt(columns: list[str] = None, rows: int | slice | tuple | str = None, index: bool = True, na_rep: str = None) str[source]

Generate an SQL statement for fetching specific columns and rows from the model, duplicate column references are aliased in order of appearance.

Parameters:
  • columns (list of str, optional) – The list of columns to include in the SQL statement. If not provided, all columns from the model will be included.

  • rows (int, slice, tuple, optional) – The rows to include in the SQL statement. It can be an integer for a single row, a slice for a range of rows, or a tuple for specific row indices. If not provided, all rows will be included.

  • index (bool, optional) – If True, include the primary index column in the SQL statement.

  • na_rep (str, optional) – If provided, all null or empty string values are replaced with value.

Returns:

The generated SQL statement.

Return type:

str

Note

Changelog:
  • Version 0.5.1 (2024-05-10):
    • Modified to allow rows argument to be provided directly as a string predicate to bypass numeric range-based selections.

  • Version 0.4.0 (2024-04-23):
    • Added nap_rep parameter to fill null or missing fields with provided value.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.9 (2024-03-19):
    • New method.

_generate_sql_stmt_fetchall(index: bool = True) str[source]

Generates an SQL statement for fetching all current rows and columns in SQLDataModel.

Parameters:

index (bool, optional) – Whether or not to include index column in the SQL statement. Default is True, including the index.

Returns:

The generated SQL statement selecting all rows and columns.

Return type:

str

Example:

import sqldatamodel as sdm

# Create a sample model
df = sdm.from_shape(shape=(10,3), headers=['Name','Age','Sex'])

# Generate an SQL statement for all data
sql_stmt = df._generate_sql_stmt_fetchall(index=False)

# View it
print(sql_stmt)

This will output statement required to fetch all the data:

SELECT
    "Name" AS "Name",
    "Age" AS "Age",
    "Sex" AS "Sex"
FROM
    "sdm"
ORDER BY
    "idx"

Note

  • Used internally for methods selecting all the current rows and columns

  • See SQLDataModel._generate_sql_stmt() for generating statements for specified rows and columns only.

Changelog:
  • Version 0.10.0 (2024-06-29):
    • New method.

_generate_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = None) tuple[tuple[str]][source]

Generates the character sets required for formatting SQLDataModel according to the value currently set at SQLDataModel.table_style.

Parameters:

style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple'], optional) – The table style to return. Default is value set on SQLDataModel.table_style.

Returns:

A 4-tuple containing the characters required for top, middle, row and lower table sections.

Return type:

tuple[tuple[str]]

Note

  • This method is called by SQLDataModel.__repr__() to parse the characters necessary for constructing the tabular representation of the SQLDataModel, any modifications or changes to this method may result in unexpected behavior.

Changelog:
  • Version 0.11.0 (2024-07-05):
    • Added ‘latex’ style format.

  • Version 0.9.3 (2024-06-28):
    • Added ‘rst-simple’ and ‘rst-grid’ style formats.

  • Version 0.3.10 (2024-04-16):
    • Added style parameter to allow use by SQLDataModel.to_text() to generate new formatting styles introduced in version 0.3.9.

  • Version 0.3.8 (2024-04-12):
    • New method.

_get_display_args(include_dtypes: bool = False) dict[source]

Retrieves the current display configuration settings of the SQLDataModel with the correct kwargs for the class SQLDataModel.__init__() method.

Parameters:

include_dtypes (bool, optional) – Whether SQLDataModel.dtypes should be included in the result. Default is False, including only display arguments.

Returns:

A dictionary containing the display configuration settings in the format {'setting': 'value'}.

Return type:

dict

Display Properties:
Dtype Property:
  • SQLDataModel.dtypes: A dictionary mapping the current model’s columns to their corresponding Python data type.

Changelog:
  • Version 0.6.2 (2024-05-15):
  • Version 0.1.9 (2024-03-19):
    • New method.

_get_sql_create_stmt() str[source]

Retrieves the SQL create table statement used to create the current SQLDataModel.

Returns:

The SQL create table statement for the SQLDataModel.

Return type:

str

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the sample model
df = sdm.SQLDataModel(data,headers)

# Retrieve the create statement for the SQLDataModel
create_stmt = df._get_sql_create_stmt()

# Print the returned statement
print(create_stmt)

This will output:

CREATE TABLE "sdm" ("idx" INTEGER PRIMARY KEY,"first" TEXT,"last" TEXT,"age" INTEGER)
Changelog:
  • Version 0.1.5 (2023-11-24):
    • New method.

_update_indicies() None[source]

Updates the SQLDataModel.indicies and SQLDataModel.row_count properties of the SQLDataModel instance representing the current valid row indicies and count.

Returns:

None

Note

  • This method is called internally any time the SQLDataModel.row_count property is subject to change, or data manipulation requires updating the current values.

  • There is no reason to call this method manually unless the model has been changed outside of the standard instance methods.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

_update_indicies_deterministic(row_index: int) None[source]

Quick implementation to update the SQLDataModel.indicies and SQLDataModel.row_count properties of the SQLDataModel instance representing the current valid row indicies and count based on the last inserted rowid.

Returns:

None

Note

Changelog:
  • Version 0.6.0 (2024-05-14):
    • Improves performance for updating row indicies when update is deterministic.

    • New method.

_update_model_metadata(update_row_meta: bool = False) None[source]

Generates and updates metadata information about the columns and optionally the rows in the SQLDataModel instance based on the current model.

Attributes updated:
Parameters:

update_row_meta (bool, optional) – If True, updates row metadata information; otherwise, retrieves column metadata only (default).

Returns:

None

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age', 'service_time']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model with sample data
df = sdm.SQLDataModel(data, headers)

# View header master
print(df.header_master)

This will output:

{'first': ('TEXT', 'str', True, '<'),
 'last': ('TEXT', 'str', True, '<'),
 'age': ('INTEGER', 'int', True, '>'),
 'service_time': ('REAL', 'float', True, '>'),
 'idx': ('INTEGER', 'int', False, '>')}

Example Attributes Modified:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age', 'service_time']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 0.7),
    (2, 'mike', 'harlin', 3),
    (3, 'pat', 'douglas', 11.5)
]

# Create the model with sample data
df = sdm.SQLDataModel(data, headers)

# Get current column count
num_cols_before = df.column_count

# Add new column
df['new_column'] = 'empty'

# Method is called behind the scenes
df._update_model_metadata()

# Get new column count
num_cols_after = df.column_count

# View difference
print(f"cols before: {num_cols_before}, cols after: {num_cols_after}")

Note

  • This method is called after operations that may modify the current model’s structure and require synchronization.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

_update_rows_and_columns_with_values(rows_to_update: tuple[int] = None, columns_to_update: list[str] = None, values_to_update: list[tuple] = None) None[source]

Generates and executes a SQL update statement to modify specific rows and columns with provided values in the SQLDataModel.

Parameters:
  • rows_to_update – A tuple of row indices to be updated. If set to None, it defaults to all rows in the SQLDataModel.

  • columns_to_update – A list of column names to be updated. If set to None, it defaults to all columns in the SQLDataModel.

  • values_to_update – A list of tuples representing values to update in the specified rows and columns.

Raises:
  • TypeError – If the values_to_update parameter is not a list or tuple.

  • DimensionError – If the shape of the provided values does not match the specified rows and columns.

  • SQLProgrammingError – If the values_to_update parameter contains invalid or SQL incompatible data.

Example:

import sqldatamodel as sdm

# Create the model with some sample data
df = sdm.SQLDataModel(
    data=[(23, 'W'), (24, 'X'), (25, 'Y'), (26, 'Z')],
    headers=['column1', 'column2']
)

# Update specific rows and columns with provided values
df._update_rows_and_columns_with_values(
    rows_to_update=(1, 2, 3),
    columns_to_update=["column1", "column2"],
    values_to_update=[(10, 'A'), (20, 'B'), (30, 'C')]
)

# Create a new column named "new_column" with default values
df._update_rows_and_columns_with_values(
    columns_to_update=["new_column"],
    values_to_update=[(None,)] * df.row_count
)

Note

  • Used by SQLDataModel.__setitem__() to broadcast updates across row and column index ranges.

  • To create a new column, pass a single header item in a list to the columns_to_update parameter.

  • To copy an existing column, pass the corresponding data is a list of tuples to the values_to_update parameter.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

_validate_column(column: str | int | slice | Iterable, unmodified: bool = False) list[str][source]

Utility function used to validate column selection and return parsed values.

Parameters:
  • column (str|int|slice|Iterable) – The column selection to validate, argument should reflect the integer indexes or column names.

  • unmodified (bool, optional) – Whether column should be returned as originally indexed. Default is False, returning as list.

Raises:
  • TypeError – If column is not one of type ‘str’, ‘int’, ‘slice’ or ‘Iterable’ representing the integer index or values of column(s) to select.

  • IndexError – If column is outside of current model range bounded by SQLDataModel.column_count whether positively or negatively indexed.

  • DimensionError – If len(column) is greater than SQLDataModel.column_count when provided as an iterable or sequence.

Returns:

A list containing the validated column values resulting from the selection.

Return type:

list[str]

Example:

import sqldatamodel as sdm

# Create a 10 rows x 6 column model
df = sdm.from_shape((10, 6), headers=['A','B','C','D','E','F'])

# Various column index types
column_indicies = [
    2,
    -3,
    ['A','B'],
    (4, 5, 1),
    [-1, 2, 0],
    slice(1, 3),
    slice(-6, -1, 2),
]

# Loop over the indicies
for column_index in column_indicies:
    # Print original and validated indexes
    print(f"{column_index} --> {sdm._validate_column(column_index)}")

This will output the original and validated column indexes:

2 --> ['C']
-3 --> ['D']
['A', 'B'] --> ['A', 'B']
(4, 5, 1) --> ['E', 'F', 'B']
[-1, 2, 0] --> ['F', 'C', 'A']
slice(1, 3, None) --> ['B', 'C']
slice(-6, -1, 2) --> ['A', 'C', 'E']

Note

  • Columns are referenced by their integer index or directly by their value as a column name, when using integers column = 0 and column = -1 will always return the first and last columns, respectively.

  • Validated column outputs will be returned as a list containing the results of the indexed columns found at SQLDataModel.headers with original ordering intact.

  • See SQLDataModel._validate_row() for validating row indicies and returning the corresponding values.

Changelog:
  • Version 0.7.9 (2024-06-20):
    • New method.

_validate_indicies(indicies) tuple[tuple[int], list[str]][source]

Validates and returns a predictable notation form of indices for accessing rows and columns in the SQLDataModel from varying 2-dimensional (row, column) indexing input types.

Two dimensional indexing:
  • tuple[row_index, column_index]: Where row_index and column_index are defined below.

Row indexing:
  • int: Single integer index. E.g., sdm[0] or sdm[-1]

  • slice: Range of row indices. E.g., sdm[2:5] or sdm[-8:-1]

  • set[int]: Discontiguous row indicies. E.g., sdm[{13, 7, 42}]

  • tuple[int]: Like set, discontiguous row indices. E.g., sdm[(-1, 9, 11)]

Column indexing:
  • int: Single integer index. E.g., sdm[:, 0] or sdm[:, -1]

  • str: Single column name. E.g., sdm['Col A'] or sdm['Name']

  • list[str]: List of column names. E.g., sdm[:,['A', 'B', 'F']]

  • list[int]: List of column indicies. E.g., sdm[:,[0, 3, 4, 9, -2]]

Parameters:

indicies – Specifies the indices for rows and columns.

Raises:
  • TypeError – If the type of indices is invalid such as a float for row index or a boolean for a column name index.

  • ValueError – If the indices are outside the current model range or if a column is not found in the current model headers when indexed by column name as str.

  • IndexError – If the column indices are outside the current column range or if a column is not found in the current model headers when indexed by int.

Returns:

A tuple containing validated row indices as a tuple and validated column indices as a list of column names.

Return type:

tuple[tuple[int], list[str]]

Example:

import sqldatamodel as sdm

# Create a 10 rows by 4 columns model
df = sdm.from_shape(shape=(10,4), headers=['A','B','C','D'])

# Index pairs to validate
input_idx = [
    (0, 'A'),
    (-1, ['B','D']),
    ({2,-7,-2}, (-2,-1)),
    (slice(-6,-1,2), slice(0,3))
]

# Store validated pairs
valid_idx = []

# Loop over the [row, col] pairs
for row, col in input_idx:
    # Validated and store the pairs
    valid_idx.append(df._validate_indicies((row, col)))

# View input and validated pairs
for original, validated in zip(input_idx, valid_idx):
    print(f"{original} --> {validated}")

This will output both the input and validated row, column index pairs:

(0, 'A') --> ((0,), ['A'])
(-1, ['B', 'D']) --> ((9,), ['B', 'D'])
({-7, 2, -2}, (-2, -1)) --> ((3, 2, 8), ['C', 'D'])
(slice(-6, -1, 2), slice(0, 3, None)) --> ((4, 6, 8), ['A', 'B', 'C'])

Note

  • This method expects indicies to be provided as a two dimensional pair of (row, column) indicies, with exceptions made for single row integer indexes or single column names.

  • Use empty slice notation to include all indicies from a given dimension, for example sdm[:, :] will always return the full model by accessing all rows and all columns.

  • See SQLDataModel.__getitem__() and SQLDataModel.__setitem__() for implementations relying on this method.

  • See SQLDataModel._validate_row() for one dimensional validation against a single row index.

  • See SQLDataModel._validate_column() for one dimensional validation against a single column index.

Changelog:
_validate_row(row: int | slice | Iterable[int], unmodified: bool = False, allow_zero_rows: bool = True) tuple[int][source]

Utility function used to validate row selection and return parsed values.

Parameters:
  • row (int|slice|Iterable[int]) – The row selection to validate, argument should reflect the integer indexes of the rows to select.

  • unmodified (bool, optional) – Whether row should be returned as originally indexed. Default is False, returning as tuple.

  • allow_zero_rows (bool, optional) – Whether row, when provided as a slice, is allowed to return zero valid row indicies. Default is True, validating on any slice argument.

Raises:
  • TypeError – If row is not one of type ‘int’, ‘slice’ or ‘Iterable’ representing the integer index of row(s) to select.

  • IndexError – If row is outside of current model range bounded by SQLDataModel.row_count whether positively or negatively indexed. Is not raised when allow_zero_rows is True and row is provided as a slice of indicies.

Returns:

A tuple containing the validated row values resulting from the selection.

Return type:

tuple[int]

Example:

import sqldatamodel as sdm

# Create a 10 rows x 3 column model
df = sdm.from_shape(shape=(10, 3), headers=['A','B','C'])

# Various row index types
row_indicies = [
    2,
    -3,
    {4,5,8},
    [-1,5,0],
    slice(2, 5),
    slice(-8, -1, 2),
]

# Loop over the indicies
for row_index in row_indicies:
    # Print original and validated indexes
    print(f"{row_index} --> {sdm._validate_row(row_index)}")

This will output the original and validated row indexes:

2 --> (2,)
-3 --> (7,)
{8, 4, 5} --> (8, 4, 5)
[-1, 5, 0] --> (9, 5, 0)
slice(2, 5, None) --> (2, 3, 4)
slice(-8, -1, 2) --> (2, 4, 6, 8)

Note

  • Rows are referenced by their integer index and not their value, as such row = 0 and row = -1 will always return the first and last rows, respectively.

  • An input of row == SQLDataModel.row_count is allowed to accomodate the append row syntax of sdm[sdm.row_count] = (values).

  • See SQLDataModel._validate_column() for validating column indicies and returning the corresponding headers.

Changelog:
  • Version 2.3.1 (2026-01-22):
    • Modified to allow validation of slice index regardless of number of rows returned when allow_zero_rows is True.

  • Version 0.7.9 (2024-06-20):
    • New method.

add_column_with_values(column_name: str, value=None) None[source]

Adds a new column with the specified column_name to the SQLDataModel. The new column is populated with the values provided in the value argument. If value is not provided (default), the new column is populated with NULL values.

Parameters:
  • column_name (str) – The name of the new column to be added.

  • value – The value to populate the new column. If None (default), the column is populated with NULL values. If a valid column name is provided, the values of that column will be used to fill the new column.

Raises:
  • DimensionError – If the length of the provided values does not match the number of rows in the model.

  • TypeError – If the data type of the provided values is not supported or translatable to an SQL data type.

Example:

import sqldatamodel as sdm

# Create model from data
df = sdm.from_csv('data.csv')

# Add new column with default value 42
df.add_column_with_values('new_column', value=42)

# Add new column by copying values from an existing column
df.add_column_with_values('new_column', value='existing_column')

Note

  • Many other methods, including SQLDataModel.__setitem__() rely on this method, therefore modifying it may cause unpredictable behavior.

  • Determination for when to copy existing versus when to assign string is value is done by SQLDataModel.__eq__() against both values

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

append_row(values: list | tuple = None) None[source]

Appends values as a new row in the SQLDataModel at the next available index based on the current max row index from SQLDataModel.indicies. If values = None, an empty row with SQL null values will be used.

Parameters:

values (list or tuple, optional) – The values to be inserted into the row. If not provided or set to None, an empty row with SQL null values will be inserted.

Raises:
  • TypeError – If values is provided and is not of type list or tuple.

  • DimensionError – If the number of values provided does not match the current column count.

  • SQLProgrammingError – If there is an issue with the SQL execution during the insertion.

Returns:

None

Example:

import sqldatamodel as sdm

# Create a rowless model
df = sdm.SQLDataModel(headers=['Name', 'Age'])

# Append a row with values
df.append_row(['Alice', 31])

# Append another row
df.append_row(['John', 48])

# View result
print(df)

This will output:

┌───┬───────┬──────┐
│   │ Name  │ Age  │
├───┼───────┼──────┤
│ 0 │ Alice │ 31   │
│ 1 │ John  │ 48   │
└───┴───────┴──────┘
[2 rows x 2 columns]

Note

  • If no values are provided, None or SQL ‘null’ will be used for the values.

  • Rows will be appended to the bottom of the model at one index greater than the current max index.

Changelog:
apply(func: Callable) SQLDataModel[source]

Applies func to the current SQLDataModel object and returns a modified SQLDataModel by passing its current values to the argument of func updated with the output.

Parameters:

func (Callable) – A callable function to apply to the SQLDataModel.

Raises:
  • TypeError – If the provided argument for func is not a valid callable.

  • SQLProgrammingError – If the provided function is not valid based on the current SQL datatypes.

Returns:

A modified SQLDataModel resulting from the application of func.

Return type:

SQLDataModel

Examples:

Applying to Single Column
import sqldatamodel as sdm

# Create the SQLDataModel:
df = sdm.from_csv('employees.csv', headers=['First Name', 'Last Name', 'City', 'State'])

# Create the function:
def uncase_name(x):
    return x.lower()

# Apply to existing column:
df['First Name'] = df['First Name'].apply(uncase_name) # existing column will be updated with new values

# Or create new one by passing in a new column name:
df['New Column'] = df['First Name'].apply(uncase_name) # new column will be created with returned values
Applying to Multiple Columns
import sqldatamodel as sdm

# Create the function, note that ``func`` must have the same number of args as the model ``.apply()`` is called on:
def summarize_employee(first, last, city, state)
    summary = f"{first} {last} is from {city}, {state}"

# Create a new 'Employee Summary' column for the returned values:
df['Employee Summary'] = df.apply(summarize_employee)
Applying a Built-in Function
import math
import sqldatamodel as sdm

# Create the SQLDataModel:
df = sdm.from_csv('number-data.csv', headers=['Number'])

# Apply the math.sqrt function to the original 'Number' column:
df_sqrt = df.apply(math.sqrt)
Applying a Lambda Function
import sqldatamodel as sdm

# Create the SQLDataModel:
df = sdm.from_csv('example.csv', headers=['Column1', 'Column2'])

# Create a new 'Column3' using the values returned from the lambda function:
df['Column3'] = df.apply(lambda x, y: x + y)

# Alternatively, an existing column can be updated in place:
df['Column1'] = df['Column1'].apply(lambda x: x // 4)

Note

  • The number of args in the inspected signature of func must equal the current number of SQLDataModel columns.

  • The number of func args must match the current number of columns in the model, or an Exception will be raised.

  • Use SQLDataModel.generate_apply_function_stub() method to return a preconfigured template using current SQLDataModel columns and dtypes to assist.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

apply_function_to_column(func: Callable, column: str | int) None[source]

Applies the specified callable function (func) to the provided SQLDataModel column. The function’s output is used to update the values in the column. For broader uses or more input flexibility, see related method apply().

Parameters:
  • func (Callable) – The callable function to apply to the column.

  • column (str | int) – The name or index of the column to which the function will be applied.

Raises:
  • TypeError – If the provided column argument is not a valid type (str or int).

  • IndexError – If the provided column index is outside the valid range of column indices.

  • ValueError – If the provided column name is not valid for the current model.

  • SQLProgrammingError – If the provided function return types or arg count is invalid or incompatible to SQL types.

Returns:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('data.csv')

# Apply upper() method using lambda function to column ``name``
df.apply_function_to_column(lambda x: x.upper(), column='name')

# Apply addition through lambda function to column at index 1
df.apply_function_to_column(lambda x, y: x + y, column=1)

Note

  • This method is a simplified version of the SQLDataModel.apply() method, which can be used for arbitrary function params and inputs.

  • If providing a function name, ensure it can be used a valid sqlite3 identifier for the instance’s connection otherwise SQLProgrammingError will be raised.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

astype(dtype: Callable | Type | Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) SQLDataModel[source]

Casts the model data into the specified python dtype.

Parameters:

dtype (Callable|Type|Literal['bool', 'bytes', 'datetime', 'float', 'int', 'None', 'str']) – The target python data type to cast the values to.

Raises:
  • ValueError – If dtype is a string and not one of ‘bool’, ‘bytes’, ‘datetime’, ‘float’, ‘int’, ‘None’, ‘str’.

  • TypeError – If dtype is a Type object that does not map to the current values, such as trying to convert a string column using the built-in float type.

Returns:

The data casted as the specified type as a new SQLDataModel.

Return type:

SQLDataModel

Warning

  • Type casting will coerce any nonconforming values to the dtype being set, this means data will be lost if casting values to incompatible types.

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height', 'Hired']
data = [
    ('John', 30, 175.3, 'True'),
    ('Alice', 28, 162.0, 'True'),
    ('Travis', 35, 185.8, 'False')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# See what we're working with
print(df)

This will output:

┌────────┬──────┬─────────┬───────┐
│ Name   │  Age │  Height │ Hired │
├────────┼──────┼─────────┼───────┤
│ John   │   30 │  175.30 │ True  │
│ Alice  │   28 │  162.00 │ True  │
│ Travis │   35 │  185.80 │ False │
└────────┴──────┴─────────┴───────┘
[3 rows x 4 columns]

We can return the values as new types or save them to a column:

# Convert the string based 'Hired' column to boolean values
df['Hired'] = df['Hired'].astype('bool')

# Let's also create a new 'Height' column, this time as an integer
df['Height int'] = df['Height'].astype('int')

# See the new values and their types
print(df)

This will output:

┌────────┬──────┬─────────┬───────┬────────────┐
│ Name   │  Age │  Height │ Hired │ Height int │
├────────┼──────┼─────────┼───────┼────────────┤
│ John   │   30 │  175.30 │ 1     │        175 │
│ Alice  │   28 │  162.00 │ 1     │        162 │
│ Travis │   35 │  185.80 │ 0     │        185 │
└────────┴──────┴─────────┴───────┴────────────┘
[3 rows x 5 columns]

Types can also be passed directly to dtype:

# Convert 'Age' directly to float using the built-in type:
df['Age float'] = df['Age'].astype(float)

# View updated model
print(df)

This will output the result of mapping the built-in float type to ‘Age’ as a new column:

┌────────┬─────┬─────────┬───────┬────────────┬───────────┐
│ Name   │ Age │  Height │ Hired │ Height int │ Age float │
├────────┼─────┼─────────┼───────┼────────────┼───────────┤
│ John   │  30 │  175.30 │ 1     │        175 │     30.00 │
│ Alice  │  28 │  162.00 │ 1     │        162 │     28.00 │
│ Travis │  35 │  185.80 │ 0     │        185 │     35.00 │
└────────┴─────┴─────────┴───────┴────────────┴───────────┘
[3 rows x 6 columns]

Note

  • Unless the returned values are saved as a new column, using this method does not change the underlying column’s type currently assigned to it, to modify the column type use SQLDataModel.set_column_dtypes() instead.

  • Any None or null values encountered will not be coerced to the specified dtype, see SQLDataModel.fillna() for handling and filling null values appropriately.

  • When passing a type directly, dtype=Type, the type must be a Callable that can be mapped directly to a value like the built-in str, int, float and bool types.

Changelog:
  • Version 0.7.6 (2024-06-16):
    • Modified to allow Callable or Type to be provided directly for dtype argument to map to data and return as new model for broader type conversion.

  • Version 0.2.1 (2024-03-24):
    • New method.

column_alignment[source]

The column alignment to use for string representations of the data, value must be one of ['dynamic','left','center','right'] Default is 'dynamic', using right-alignment for numeric columns and left-aligned for all others.

Type:

str

column_count[source]

The current column count of the model.

Type:

int

concat(other: SQLDataModel | list | tuple, inplace: bool = True) None | SQLDataModel[source]

Concatenates the provided data to SQLDataModel along the row axis, returning a new model or modifying the existing instance inplace.

Parameters:
  • other (SQLDataModel | list | tuple) – The SQLDataModel, list, or tuple to concatenate or append.

  • inplace (bool, optional) – If True (default), performs the concatenation in-place, modifying the current model. If False, returns a new SQLDataModel instance with the concatenated result.

Returns:

None when inplace = True and SQLDataModel when in_place = False

Return type:

None or SQLDataModel

Raises:
  • TypeError – If the other argument is not one of type SQLDataModel, list, or tuple.

  • ValueError – If other is a list or tuple with insufficient data where the column dimension is < 1.

  • DimensionError – If the column count of the current model does not match the column count of the other model or tuple.

Example:

import sqldatamodel as sdm

# Datasets a and b
data_a = (['A', 1], ['B', 2])
data_b = (['C', 3], ['D', 4])

# Create the models
df_a = SQLDataModel(data_a, headers=['letter', 'number'])
df_b = SQLDataModel(data_b, headers=['letter', 'number'])

# Concatenate the two models
df_ab = df_a.concat(df_b, inplace=False)

# View result
print(df_ab)

This will output:

┌────────┬────────┐
│ letter │ number │
├────────┼────────┤
│ A      │ 1      │
│ B      │ 2      │
│ C      │ 3      │
│ D      │ 4      │
└────────┴────────┘
[4 rows x 2 columns]

Concatenation can be done using other objects as well:

# List or tuples can also be used directly
data_e = ['E', 5]

# Append in place
df_ab.concat(data_e)

# View result
print(df_ab)

This will output:

┌───┬────────┬────────┐
│   │ letter │ number │
├───┼────────┼────────┤
│ 0 │ A      │      1 │
│ 1 │ B      │      2 │
│ 2 │ C      │      3 │
│ 3 │ D      │      4 │
│ 4 │ E      │      5 │
└───┴────────┴────────┘
[5 rows x 2 columns]

Note

  • Models must be of compatible dimensions with equal column_count or equivalent dimension if list or tuple

  • Headers are inherited from the model calling the SQLDataModel.concat() method whether done inplace or being returned as new instance.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

contains(pat: str | Iterable[str], case: bool = True) set[int][source]

Return the row indices that contain the specified pattern(s) in any column from the model, converting to str(value) for comparison.

Parameters:
  • pat (str | Iterable[str]) – The pattern or iterable of patterns to search for within the data.

  • case (bool, optional) – If True (default), the search is case-sensitive. If False, the search is case-insensitive.

Raises:

TypeError – If argument for pat is not of type ‘str’ or an iterable of type ‘str’ representing the substring pattern(s).

Returns:

Set of row indices containing values that match the pattern(s).

Return type:

set[int]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Sex', 'City']
data = [
    ('Mike', 31, 'M', 'Chicago'),
    ('John', 25, 'M', 'Dayton'),
    ('Alice', 27, 'F', 'Boston'),
    ('Sarah', 35, 'F', 'Houston'),
    ('Bobby', 42, 'M', 'Chicago'),
    ('Steve', 28, 'F', 'Austin'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter for rows containing the string 'Chicago'
matching_indicies = df['City'].contains('Chicago')

# Apply filter to model
df_chicago = df[matching_indicies]

# View result
print(df_chicago)

This will output the result of applying the filter to the model:

┌───┬───────┬─────┬─────┬─────────┐
│   │ Name  │ Age │ Sex │ City    │
├───┼───────┼─────┼─────┼─────────┤
│ 0 │ Mike  │  31 │ M   │ Chicago │
│ 4 │ Bobby │  42 │ M   │ Chicago │
└───┴───────┴─────┴─────┴─────────┘
[2 rows x 4 columns]

Instead of searching a single column, the entire model can be searched:

# Method can also search all columns, and be applied directly
df_with_e = df[df.contains('E', case=False)]

# View result
print(df_with_e)

This will output the result of a case-insensitive search:

┌───┬───────┬─────┬─────┬─────────┐
│   │ Name  │ Age │ Sex │ City    │
├───┼───────┼─────┼─────┼─────────┤
│ 0 │ Mike  │  31 │ M   │ Chicago │
│ 2 │ Alice │  27 │ F   │ Boston  │
│ 5 │ Steve │  28 │ F   │ Austin  │
└───┴───────┴─────┴─────┴─────────┘
[3 rows x 4 columns]

This can be used in combination with the setitem syntax to selectively update values as well:

# Create a 'State' column with a default value
df['State'] = None

# Filter and set the values that contain the pattern
df[df.contains('Chicago'), 'State'] = 'Illinois'

# Multiple conditions can be used
tx_1 = df.contains('Houston')
tx_2 = df.contains('Austin')

# Then chained together using set notation
df[(tx_1 | tx_2), 'State'] = 'Texas'

# Alternatively, an iterable of patterns can be provided
df[df.contains(['Houston','Austin']), 'State'] = 'Texas'

Note

Changelog:
  • Version 0.7.8 (2024-06-18):
    • New method.

copy(data_only: bool = False) SQLDataModel[source]

Returns a deep copy of the current model as a new SQLDataModel.

Parameters:

data_only (bool) – If True, only the data is copied, otherwise display and styling parameters are included. Default is False.

Returns:

A cloned copy from the original as a new SQLDataModel.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the original model with list styling
df = sdm.SQLDataModel(data, headers, table_style='list')

# Create two copies, one full and one with data only
copy_full = df.copy()
copy_data = df.copy(data_only=True)

# View both copies
print(copy_full)
print(copy_data)

This will output both copies, with copy_full including any styling parameters such as table_style='list':

   Name    Age   Height
-  ------  ---  -------
0  John     30   175.30
1  Alice    28   162.00
2  Travis   35   185.80

With the output for copy_data containing only the original model’s data:

┌───┬────────┬─────┬─────────┐
│   │ Name   │ Age │  Height │
├───┼────────┼─────┼─────────┤
│ 0 │ John   │  30 │  175.30 │
│ 1 │ Alice  │  28 │  162.00 │
│ 2 │ Travis │  35 │  185.80 │
└───┴────────┴─────┴─────────┘

Note

Changelog:
  • Version 0.4.2 (2024-05-03):
    • New method.

count() SQLDataModel[source]

Returns a new SQLDataModel containing the counts of non-null values for each column in a row-wise orientation.

Returns:

A new SQLDataModel containing the counts of non-null values in each column.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data with missing values
headers = ['Name', 'Age', 'Gender', 'Tenure']
data = [
    ('Alice', 25, 'Female', 1.0),
    ('Bob', None, 'Male', 2.7),
    ('Charlie', 30, 'Male', None),
    ('David', None, 'Male', 3.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get counts
counts = df.count()

# View result
print(counts)

This will output the count of all non-null values for each column:

┌──────┬─────┬────────┬────────┐
│ Name │ Age │ Gender │ Tenure │
├──────┼─────┼────────┼────────┤
│    4 │   2 │      4 │      3 │
└──────┴─────┴────────┴────────┘
[1 rows x 4 columns]

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

count_unique() SQLDataModel[source]

Returns a new SQLDataModel containing the total counts and unique values for each column in the model for both null and non-null values.

Metrics:
  • 'column' contains the names of the columns counted.

  • 'na' contains the total number of null values in the column.

  • 'unique' contains the total number of unique values in the column.

  • 'count' contains the total number of non-null values in the column.

  • 'total' contains the total number of all null and non-null values in the column.

Returns:

A new SQLDataModel containing columns ‘column’, ‘unique’, and ‘count’ representing the column name, total unique values, and total values count, respectively.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Gender']
data = [
    ('Alice', 25, 'Female'),
    ('Bob', 30, None),
    ('Alice', 25, 'Female')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get the value count information
count_model = df.count_unique()

# View the count information
print(count_model)

This will output:

┌────────┬──────┬────────┬───────┬───────┐
│ column │   na │ unique │ count │ total │
├────────┼──────┼────────┼───────┼───────┤
│ Name   │    0 │      2 │     3 │     3 │
│ Age    │    0 │      2 │     3 │     3 │
│ Gender │    1 │      1 │     2 │     3 │
└────────┴──────┴────────┴───────┴───────┘
[3 rows x 5 columns]

Note

  • See SQLDataModel.count() for the count of non-null values for each column in a row-wise orientation.

Changelog:
  • Version 0.3.2 (2024-04-02):
    • Renamed method from counts to count_unique for more precise definition.

    • New method.

data(index: bool = False, include_headers: bool = False, strict_2d: bool = False) list[tuple][source]

Returns the SQLDataModel data as a list of tuples for multiple rows, a single tuple for individual rows, as a single item for individual cells. Data is returned without index and headers by default, use include_headers=True or index=True to modify.

Parameters:
  • index (bool, optional) – If True, includes the index in the result; if False, excludes the index. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result; if False, excludes headers. Default is False.

  • strict_2d (bool, optional) – If True, returns data as a 2-dimensional list of tuples regardless of data dimension. Default is False.

Returns:

The data currently stored in the model as a list of tuples.

Return type:

list[tuple]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers, display_float_precision=2)

# View full table
print(df)

This will output:

┌────────┬──────┬─────────┐
│ Name   │  Age │  Height │
├────────┼──────┼─────────┤
│ John   │   30 │  175.30 │
│ Alice  │   28 │  162.00 │
│ Travis │   35 │  185.80 │
└────────┴──────┴─────────┘
[3 rows x 3 columns]

Get data for specific row:

# Grab data from single row
row_data = df[0].data()

# View it
print(row_data)

This will output the row as a tuple of values:

('John', 30, 175.3)

Get data for specific column:

# Grab data from single column
col_data = df['Name'].data()

# View it
print(col_data)

This will output the column values as a list of tuples:

[('John',), ('Alice',), ('Travis',)]

Note

  • Many other SQLDataModel methods rely on this method, changing it will lead to undefined behavior.

  • See related SQLDataModel.from_data() for creating a new SQLDataModel from existing data sources.

  • Use strict_2d = True to always return data as a list of tuples regardless of data dimension.

Changelog:
  • Version 0.10.0 (2024-06-29):
  • Version 0.5.0 (2024-05-09):
    • Added strict_2d parameter to allow predictable return type regardless of data dimension.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.5 (2023-11-24):
    • New method.

deduplicate(subset: list[str] = None, reset_index: bool = True, keep_first: bool = True, inplace: bool = True) None | SQLDataModel[source]

Removes duplicate rows from the SQLDataModel based on the specified subset of columns. Deduplication occurs inplace by default, otherwise use inplace=False to return a new SQLDataModel.

Parameters:
  • subset (list[str], optional) – List of columns to consider when identifying duplicates. If None, all columns are considered. Defaults to None.

  • reset_index (bool, optional) – If True, resets the index after deduplication starting at 0; otherwise retains current indicies.

  • keep_first (bool, optional) – If True, keeps the first occurrence of each duplicated row; otherwise, keeps the last occurrence. Defaults to True.

  • inplace (bool, optional) – If True, modifies the current SQLDataModel in-place; otherwise, returns a new SQLDataModel without duplicates. Defaults to True.

Raises:

ValueError – If a column specified in subset is not found in the SQLDataModel.

Returns:

If inplace = True the method modifies the current SQLDataModel in-place return None, otherwise if inplace = False a new SQLDataModel is returned.

Return type:

None or SQLDataModel

Examples:

Based on Single Column
import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Deduplicate based on a specific column
df.deduplicate(subset='ID', keep_first=True, inplace=True)
Based on Multiple Columns
import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Deduplicate based on multiple columns and save to keep both models
df_deduped = df.deduplicate(subset=['ID', 'Name'], keep_first=False, inplace=False)

Note

  • Ordering for keep_first is determined by the current SQLDataModel.sql_idx order of the instance.

  • For multiple columns ordering is done sequentially favoring first index in subset, then i+1, …, to i+len(subset)

Changelog:
  • Version 0.1.5 (2023-11-24):
    • New method.

describe(exclude_columns: str | list = None, exclude_dtypes: list[Literal['str', 'int', 'float', 'date', 'datetime', 'bool']] = None, ignore_na: bool = True, **kwargs) SQLDataModel[source]

Generates descriptive statistics for columns in the SQLDataModel instance based on column dtype including count, unique values, top value, frequency, mean, standard deviation, minimum, 25th, 50th, 75th percentiles, maximum and dtype for specified column.

Parameters:
  • exclude_columns (str | list, optional) – Columns to exclude from the analysis. Default is None.

  • exclude_dtypes (list[Literal["str", "int", "float", "date", "datetime", "bool"]], optional) – Data types to exclude from the analysis. Default is None.

  • ignore_na (bool, optional) – If True, ignores NA like values (‘NA’, ‘ ‘, ‘None’) when computing statistics. Default is True.

  • **kwargs – Additional keyword arguments to be passed to the execute_fetch method.

Statistics Described:
  • count: Total number of non-null values for specified column

  • unique: Total number of unique values for specified column

  • top: Top value represented for specified column, ties broken arbitrarily

  • freq: Frequency of corresponding value represented in ‘top’ metric

  • mean: Mean as calculated by summing all values and dividing by ‘count’

  • std: Standard Deviation for specified column

    • Uncorrected sample standard deviation for int, float dtypes

    • Mean time difference represented in number of days for date, datetime dtypes

    • ‘NaN’ for all other dtypes

  • min: Minimum value for specified column

    • Least value for int, float dtypes

    • Least value sorted by alphabetical ascending for str dtypes

    • Earliest date or datetime for date, datetime dtypes

  • p25: Percentile, 25th

    • Max first bin value as determined by quartered binning of values for int, float dtypes

    • ‘NaN’ for all other dtypes

  • p50: Percentile, 50th

    • Max second bin value as determined by quartered binning of values for int, float dtypes

    • ‘NaN’ for all other dtypes

  • p75: Percentile, 75th

    • Max third bin value as determined by quartered binning of values for int, float dtypes

    • ‘NaN’ for all other dtypes

  • max: Maximum value for specified column

    • Greatest value for int, float dtypes

    • Greatest value sorted by alphabetical ascending for str dtypes

    • Latest date or datetime for date, datetime dtypes

  • dtype: Datatype of specified column

    • Python datatype as determined by relevant class __name__ attribute, e.g. ‘float’ or ‘int’

    • dtypes can be excluded by using exclude_dtypes parameter

Returns:

A new SQLDataModel containing a comprehensive set of descriptive statistics for selected columns.

Return type:

SQLDataModel

Note

  • Standard deviation is calculated using uncorrected sample standard deviation for numeric dtypes, and timediff in days for datetime dtypes

  • Ties in unique, top and freq columns are broken arbitrarily as determined by first ordering of values prior to calling describe()

  • Ties encountered when binning for p25, p50, p75 will favor lower bins for data that cannot be quartered cleanly

  • Metrics for count, min, p25, p50, p75 and max include non-null values only

  • Using ignore_na=True only affects inclusion of ‘NA like’ values such as empty strings

  • Floating point precision determined by SQLDataModel.display_float_precision attribute

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('employees.csv')

# View all 10 rows
print(df)

This will output:

┌───┬──────────────────┬────────────┬─────────────┬───────────────┬────────┬─────────────────────┐
│   │ name             │ hire_date  │ country     │ service_years │    age │ last_update         │
├───┼──────────────────┼────────────┼─────────────┼───────────────┼────────┼─────────────────────┤
│ 0 │ Pamela Berg      │ 2007-06-06 │ New Zealand │          3.02 │     56 │ 2023-08-12 17:13:46 │
│ 1 │ Mason Hoover     │ 2009-04-19 │ Australia   │          5.01 │     41 │ 2023-05-18 01:29:44 │
│ 2 │ Veda Suarez      │ 2007-07-02 │ Ukraine     │          4.65 │     26 │ 2023-12-09 15:38:01 │
│ 3 │ John Smith       │ 2017-08-12 │ New Zealand │          3.81 │     35 │ 2023-03-10 18:23:56 │
│ 4 │ Xavier McCoy     │ 2021-04-03 │ France      │          2.95 │     42 │ 2023-09-27 11:39:08 │
│ 5 │ John Smith       │ 2020-10-11 │ Germany     │          4.61 │     56 │ 2023-12-09 18:41:52 │
│ 6 │ Abigail Mays     │ 2021-07-25 │ Costa Rica  │          5.34 │     50 │ 2023-02-11 16:43:07 │
│ 7 │ Rama Galloway    │ 2009-02-09 │ Italy       │          3.87 │     24 │ 2023-03-13 16:08:48 │
│ 8 │ Lucas Rodriquez  │ 2018-06-19 │ New Zealand │          2.73 │     28 │ 2023-03-17 01:45:22 │
│ 9 │ Hunter Donaldson │ 2015-12-18 │ Belgium     │          4.58 │     43 │ 2023-04-06 03:22:54 │
└───┴──────────────────┴────────────┴─────────────┴───────────────┴────────┴─────────────────────┘
[10 rows x 6 columns]

Now that we have our SQLDataModel, we can generate some statistics:

# Generate statistics
df_described = df.describe()

# View stats
print(df_described)

This will output:

┌────────┬──────────────┬─────────────┬─────────────┬───────────────┬────────┬─────────────────────┐
│ metric │         name │   hire_date │     country │ service_years │    age │         last_update │
├────────┼──────────────┼─────────────┼─────────────┼───────────────┼────────┼─────────────────────┤
│ count  │           10 │          10 │          10 │            10 │     10 │                  10 │
│ unique │            9 │          10 │           8 │            10 │      9 │                  10 │
│ top    │   John Smith │  2021-07-25 │ New Zealand │          5.34 │     56 │ 2023-12-09 18:41:52 │
│ freq   │            2 │           1 │           3 │             1 │      2 │                   1 │
│ mean   │          NaN │  2014-11-24 │         NaN │          4.06 │     40 │ 2023-06-16 19:18:39 │
│ std    │          NaN │ 2164.4 days │         NaN │          0.92 │     11 │         117.58 days │
│ min    │ Abigail Mays │  2007-06-06 │   Australia │          2.73 │     24 │ 2023-02-11 16:43:07 │
│ p25    │          NaN │  2009-02-09 │         NaN │          3.02 │     28 │ 2023-03-13 16:08:48 │
│ p50    │          NaN │  2017-08-12 │         NaN │          4.58 │     42 │ 2023-05-18 01:29:44 │
│ p75    │          NaN │  2020-10-11 │         NaN │          4.65 │     50 │ 2023-09-27 11:39:08 │
│ max    │ Xavier McCoy │  2021-07-25 │     Ukraine │          5.34 │     56 │ 2023-12-09 18:41:52 │
│ dtype  │          str │        date │         str │         float │    int │            datetime │
└────────┴──────────────┴─────────────┴─────────────┴───────────────┴────────┴─────────────────────┘
[12 rows x 7 columns]

Specific columns or data types can be excluded from result:

# Set filters to exclude all str dtypes and the 'hire_date' column:
df_describe = df.describe(exclude_dtypes=['str'], exclude_columns=['hire_date'])

# View statistics
print(df_described)

This will output:

┌────────┬───────────────┬────────┬─────────────────────┐
│ metric │ service_years │    age │         last_update │
├────────┼───────────────┼────────┼─────────────────────┤
│ count  │            10 │     10 │                  10 │
│ unique │            10 │      9 │                  10 │
│ top    │          5.34 │     56 │ 2023-10-28 05:42:43 │
│ freq   │             1 │      2 │                   1 │
│ mean   │          4.06 │     40 │ 2023-08-11 23:18:12 │
│ std    │          0.92 │     11 │          73.15 days │
│ min    │          2.73 │     24 │ 2023-04-07 23:56:06 │
│ p25    │          3.02 │     28 │ 2023-06-02 14:36:19 │
│ p50    │          4.58 │     42 │ 2023-09-09 19:18:38 │
│ p75    │          4.65 │     50 │ 2023-10-09 19:34:55 │
│ max    │          5.34 │     56 │ 2023-10-28 05:42:43 │
│ dtype  │         float │    int │            datetime │
└────────┴───────────────┴────────┴─────────────────────┘
[12 rows x 4 columns]

Important

  • Generally, do not rely on SQLDataModel to do statistics, use NumPy or a real scientific computing library instead.

Note

  • Use SQLDataModel.infer_dtypes() to cast columns to their apparent data type, or set it manually with SQLDataModel.set_column_dtypes() to convert columns to different data types.

  • Statistics for date and datetime can be unpredictable if formatting used is inconsistent with conversion to Julian days or if column data type is incorrect.

Changelog:
  • Version 0.6.3 (2024-05-16):
    • Modified model to output values as string data types and set columns to right-aligned if arguments are not present in kwargs to retain metric resolution while having numeric alignment.

  • Version 0.1.9 (2024-03-19):
    • New method.

display_color[source]

The display color to use for string representations of the model. Default is None, using the standard terminal color.

Type:

ANSIColor

display_float_precision[source]

The floating point precision to use for string representations of the table, does not affect the actual floating point values stored in the model. Default is 2.

Type:

int

display_index[source]

Determines whether the index column is displayed when string representations of the table are generated. Default is True.

Type:

bool

display_max_rows[source]

The maximum number of rows to display. Default is 1,000 rows.

Type:

int

drop_column(column: int | str | list, inplace: bool = True) None | SQLDataModel[source]

Drops the specified column(s) from the SQLDataModel. Values for column can be a single column name or index, or a list of multiple column names or indicies to drop from the model.

Parameters:
  • column (int | str | list) – The index, name, or list of indices/names of the column(s) to drop.

  • inplace (bool) – If True, drops the column(s) in-place and updates the model metadata. If False, returns a new SQLDataModel object without the dropped column(s) and does not modify the original object. Default is True.

Returns:

If inplace is True, returns None. Otherwise, returns a new SQLDataModel object without the dropped column(s).

Return type:

None | SQLDataModel

Raises:
  • TypeError – If the column parameter is not of type ‘int’, ‘str’, or a list containing equivalent types.

  • IndexError – If any provided column index is outside the current column range.

  • ValueError – If any provided column name is not found in the model’s headers.

Examples:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Gender', 'City']
data = [
    ('Alice', 30, 'Female', 'Milwaukee'),
    ('Sarah', 35, 'Female', 'Houston'),
    ('Mike', 28, 'Male', 'Atlanta'),
    ('John', 25, 'Male', 'Boston'),
    ('Bob', 22, 'Male', 'Chicago'),
]

# Create the model
df = sdm.SQLDataModel(data,headers)

# Drop the 'Gender' column
df.drop_column('Gender')

# View updated model
print(df)

This will output:

┌───────┬──────┬───────────┐
│ Name  │  Age │ City      │
├───────┼──────┼───────────┤
│ Alice │   30 │ Milwaukee │
│ Sarah │   35 │ Houston   │
│ Mike  │   28 │ Atlanta   │
│ John  │   25 │ Boston    │
│ Bob   │   22 │ Chicago   │
└───────┴──────┴───────────┘
[5 rows x 3 columns]

Dropping multiple columns:

# Drop first and last columns by index
df.drop_column([0,-1])

# View updated model
print(df)

This will output:

┌──────┬────────┐
│  Age │ Gender │
├──────┼────────┤
│   30 │ Female │
│   35 │ Female │
│   28 │ Male   │
│   25 │ Male   │
│   22 │ Male   │
└──────┴────────┘
[5 rows x 2 columns]

Drop columns and return as a new SQLDataModel:

# Drop the multiple columns and return as a new model
df = df.drop_column(['Age','Gender'], inplace=False)

# View updated model
print(df)

This will output:

┌───────┬───────────┐
│ Name  │ City      │
├───────┼───────────┤
│ Alice │ Milwaukee │
│ Sarah │ Houston   │
│ Mike  │ Atlanta   │
│ John  │ Boston    │
│ Bob   │ Chicago   │
└───────┴───────────┘
[5 rows x 2 columns]

Note

  • Arguments for column can be a single str or int or list[str] containing str or list[int] containing int representing column names or column indicies, respectively, but they cannot be combined and provided together. For example, passing columns = ['First Name', 3] will raise a TypeError exception.

  • The equivalent of this method can also be achieved by simply indexing the required rows and columns using sdm[rows, column] notation, see SQLDataModel.__getitem__() for additional details.

Changelog:
  • Version 0.2.3 (2024-03-28):
    • New method.

drop_row(row: int | Iterable[int], inplace: bool = True, ignore_index: bool = False) None | SQLDataModel[source]

Drops the specified row(s) indicies from the SQLDataModel. Values for row can be a single row index, or an iterable collection of multiple row indicies to drop.

Parameters:
  • row (int | Iterable[int]) – The row index or row indicies to drop.

  • inplace (bool, optional) – If True, drops the rows(s) in-place and updates the model metadata. If False, returns a new SQLDataModel object without the dropped row(s). Default is True.

  • ignore_index (bool, optional) – If True, drops the row(s) and ignores the index for the resulting model. Default is False, keeping original indicies in new model.

Returns:

If in-place is True, returns None. Otherwise, returns a new SQLDataModel object without the dropped rows(s).

Return type:

None | SQLDataModel

Raises:
  • TypeError – If the row parameter is not of type ‘int’ or an iterable collection of type ‘int’ representing the row indicies to drop.

  • IndexError – If any provided row index is outside the current row range determined by the values at SQLDataModel.indicies.

Example:

import sqldatamodel as sdm

headers = ['Rank', 'Location', 'Population']
data = [(1, "Tokyo, Japan", 37.4),
        (2, "Delhi, India", 31.0),
        (3, "Shanghai, China", 27.1),
        (4, "São Paulo, Brazil", 22.0),
        (5, "Mexico City, Mexico", 21.8),
        (6, "Cairo, Egypt", 21.3),
        (7, "Dhaka, Bangladesh", 21.0),
        (8, "Mumbai, India", 20.7),
        (9, "Beijing, China", 20.5),
        (10,"Osaka, Japan", 19.1)]

# Create the sample model
df = sdm.SQLDataModel(data, headers)

# Drop the last row
df.drop_row(-1)

# Drop rows based on condition of less than 25 Million population
df.drop_row(df['Population'] < 25.0)

# View result
print(df)

This will output:

┌──────┬─────────────────┬────────────┐
│ Rank │ Location        │ Population │
├──────┼─────────────────┼────────────┤
│    1 │ Tokyo, Japan    │       37.4 │
│    2 │ Delhi, India    │       31.0 │
│    3 │ Shanghai, China │       27.1 │
└──────┴─────────────────┴────────────┘
[3 rows x 3 columns]

Dropping multiple rows and returning a new model:

# Create a new model using the same sample data
df = SQLDataModel(data, headers)

# Set row indicies to drop
row_indices = range(0, 5) # or [0, 1, 2, 3, 4]

# Drop top 5 cities and return as a new model
df_new = df.drop_row(row_indices, inplace=False)

# View new model
print(df_new)

This will output:

┌──────┬───────────────────┬────────────┐
│ Rank │ Location          │ Population │
├──────┼───────────────────┼────────────┤
│    6 │ Cairo, Egypt      │       21.3 │
│    7 │ Dhaka, Bangladesh │       21.0 │
│    8 │ Mumbai, India     │       20.7 │
│    9 │ Beijing, China    │       20.5 │
│   10 │ Osaka, Japan      │       19.1 │
└──────┴───────────────────┴────────────┘
[5 rows x 3 columns]

Important

Rows are referenced by their integer index, and not by their value. This means that row index 0 will always refer to the first row in the model, and -1 will always refer to the last. This distinction is usually irrelevant when the two are aligned, however this is no longer the case when row(s) are dropped from anywhere except the very last row.

Note

  • Row indicies are retained after being deleted by default, provide ignore_index=True to reset row indicies if required.

  • The equivalent of this method can also be achieved by simply indexing the required rows and columns using sdm[rows, column] notation, see SQLDataModel.__getitem__() for additional details.

Changelog:
  • Version 0.7.4 (2024-06-13):
    • New method.

dropna(axis: Literal['rows', 'columns'] = 'columns', how: Literal['any', 'all'] = 'all', strictly_null: bool = True, ignore_index: bool = True, inplace: bool = True) None | SQLDataModel[source]

Drop rows or columns with NA values from the SQLDataModel.

Parameters:
  • axis (Literal['rows', 'columns'], optional) – The axis along which to drop NA values as either 'rows' or 'columns'. Default is 'columns', dropping columns with all NA values.

  • how (Literal['any', 'all'], optional) – Determines when to drop NA values, 'any' drops if any NA values are present, 'all' drops only if all values are NA. Default is 'all', dropping only when all the values are NA along a the specified axis.

  • strictly_null (bool, optional) – If True, only strictly NULL values are considered NA. If False, additional representations of NA (e.g., ‘NaN’, ‘n/a’) are also considered.

  • ignore_index (bool, optional) – If True, the index column is not considered when dropping rows. Ignored if when axis is set to 'columns'.

  • inplace (bool, optional) – If True, perform the operation in place and modify the SQLDataModel. If False, return a new SQLDataModel with the NA values dropped.

Returns:

If inplace=False returns a new SQLDataModel with the NA values dropped. Otherwise, modifies the current SQLDataModel in place and returns None.

Return type:

None or SQLDataModel

Raises:
  • ValueError – If axis is not one of (‘rows’, ‘columns’) or how is not one of 'any' or 'all'.

  • DimensionError – If all columns are to be dropped when axis='columns' resulting in an invalid model schema.

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Gender', 'City']
data = [
    ('Sarah', 35, 'Female', 'Houston'),
    ('Alice', None, 'Female', 'Milwaukee'),
    ('Mike', None, 'Male', 'Atlanta'),
    ('John', 25, 'Male', 'Boston'),
    ('Bob', None, 'Male', 'Chicago'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Drop columns with any NA values in place
df.dropna(axis='columns', how='any', inplace=True)

# View result
print(df)

This will output the updated model after dropping the ‘Age’ column:

┌───────┬────────┬───────────┐
│ Name  │ Gender │ City      │
├───────┼────────┼───────────┤
│ Sarah │ Female │ Houston   │
│ Alice │ Female │ Milwaukee │
│ Mike  │ Male   │ Atlanta   │
│ John  │ Male   │ Boston    │
│ Bob   │ Male   │ Chicago   │
└───────┴────────┴───────────┘

Rows can also be used as the axis to check against

# Drop rows with any NA values
df = df.dropna(axis='rows', how='any')

# View result
print(df)

This will output the result containing only the rows where no NA values are present:

┌───────┬─────┬────────┬─────────┐
│ Name  │ Age │ Gender │ City    │
├───────┼─────┼────────┼─────────┤
│ Sarah │  35 │ Female │ Houston │
│ John  │  25 │ Male   │ Boston  │
└───────┴─────┴────────┴─────────┘
[2 rows x 4 columns]

Note

  • Null or na like is determined by satisfying the SQL NULL value or ‘null like’ values when strictly_null = False in the specified axis.

  • See SQLDataModel.isna() or SQLDataModel.notna() to filter for rows containing null values.

  • See SQLDataModel.fillna() to fill all missing or null values in the model.

Changelog:
  • Version 1.0.0 (2024-08-09):
    • Changed default to inplace = True to align more with similar SQLDataModel methods.

  • Version 0.12.3 (2024-07-11):
    • New method.

dtypes[source]

The current model data types mapped to each column in the format of {'col': 'dtype'} where 'dtype' is a string representing the corresponding python type.

Type:

dict[str, str]

endswith(pat: str | Iterable[str], case: bool = True) set[int][source]

Return the row indices that end with the specified pattern(s) in any column from the model, converting to str(value) for comparison.

Parameters:
  • pat (str | Iterable[str]) – The pattern or iterable of patterns to search for within the data.

  • case (bool, optional) – If True (default), the search is case-sensitive. If False, the search is case-insensitive.

Raises:

TypeError – If argument for pat is not of type ‘str’ or an iterable of type ‘str’ representing the substring pattern(s).

Returns:

Set of row indices containing values that match the pattern(s).

Return type:

set[int]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Sex', 'City']
data = [
    ('Mike', 31, 'M', 'Chicago'),
    ('John', 25, 'M', 'Dayton'),
    ('Alice', 27, 'F', 'Boston'),
    ('Sarah', 35, 'F', 'Houston'),
    ('Bobby', 42, 'M', 'Chicago'),
    ('Steve', 28, 'F', 'Austin'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter for rows where any column ends with the string 'ston'
matching_indices = df['City'].endswith('ston')

# Apply filter to model
df_suffix = df[matching_indices]

# View result
print(df_suffix)

This will output the result of applying the filter to the model:

┌───┬───────┬─────┬─────┬─────────┐
│   │ Name  │ Age │ Sex │ City    │
├───┼───────┼─────┼─────┼─────────┤
│ 2 │ Alice │  27 │ F   │ Boston  │
│ 3 │ Sarah │  35 │ F   │ Houston │
└───┴───────┴─────┴─────┴─────────┘
[2 rows x 4 columns]

Instead of searching a single column, the entire model can be searched:

# Method can also search all columns, and be applied directly
df_n = df[df.endswith('N', case=False)]

# View result
print(df_n)

This will output the result of a case-insensitive search:

┌───┬───────┬─────┬─────┬─────────┐
│   │ Name  │ Age │ Sex │ City    │
├───┼───────┼─────┼─────┼─────────┤
│ 1 │ John  │  25 │ M   │ Dayton  │
│ 2 │ Alice │  27 │ F   │ Boston  │
│ 3 │ Sarah │  35 │ F   │ Houston │
│ 5 │ Steve │  28 │ F   │ Austin  │
└───┴───────┴─────┴─────┴─────────┘
[4 rows x 4 columns]

This can be used in combination with the setitem syntax to selectively update values as well:

# Create a new column 'Parity' with a default value
df['Parity'] = None

# Create patterns for even or odd suffixes
even_suffixes = [0,2,4,6,8]
odd_suffixes = [1,3,5,7,9]

# Create the filters for both outcomes
even_filter = df.endswith(even_suffixes)
odd_filter = df.endswith(odd_suffixes)

# Update values based on filters using setitem syntax
df[even_filter, 'Parity'] = 'Even'
df[odd_filter, 'Parity'] = 'Odd'

# View result
print(df)

This will output the result of selectively applying updates based on our filters:

┌───┬───────┬─────┬─────┬─────────┬────────┐
│   │ Name  │ Age │ Sex │ City    │ Parity │
├───┼───────┼─────┼─────┼─────────┼────────┤
│ 0 │ Mike  │  31 │ M   │ Chicago │ Odd    │
│ 1 │ John  │  25 │ M   │ Dayton  │ Odd    │
│ 2 │ Alice │  27 │ F   │ Boston  │ Odd    │
│ 3 │ Sarah │  35 │ F   │ Houston │ Odd    │
│ 4 │ Bobby │  42 │ M   │ Chicago │ Even   │
│ 5 │ Steve │  28 │ F   │ Austin  │ Even   │
└───┴───────┴─────┴─────┴─────────┴────────┘
[6 rows x 5 columns]

Note

Changelog:
  • Version 0.7.8 (2024-06-18):
    • New method.

execute_fetch(sql_query: str, sql_params: tuple = None, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel object, including display and style properties, after executing the provided SQL query using the current SQLDataModel. This method is called by other methods which expect results to be returned from their execution.

Parameters:
  • sql_query (str) – The SQL query to execute with the expectation of rows returned.

  • sql_params (tuple, optional) – The SQL parameters to provide for parameterized queries.

  • **kwargs (optional) – Additional keyword args to pass to SQLDataModel constructor

Raises:
  • SQLProgrammingError – If the provided SQL query is invalid or malformed.

  • ValueError – If the provided SQL query was valid but returned 0 rows, which is insufficient to return a new model.

Returns:

A new SQLDataModel instance containing the result of the SQL query.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['Column1', 'Column2'])

# Create the SQL query to execute
query = 'SELECT * FROM sdm WHERE Column1 > 10'

# Fetch and save the result to a new instance
result_model = df.execute_fetch(query)

# Create a parameterized SQL query to execute
query = 'SELECT * FROM sdm WHERE Column1 > ? OR Column2 < ?'
params = (10, 20)

# Provide the SQL and the statement parameters
result_parameterized = df.execute_fetch(query, params)

Important

Note

  • Use SQLDataModel.set_model_name() to modify the table name used by the model, default name set as 'sdm'.

  • Display properties such as float precision, index column or table styling are also passed to the new instance when not provided in kwargs.

Changelog:
  • Version 2.3.0 (2026-01-21):
    • Modified to allow returning empty result set from execution of sql_query returning model with zero rows using query metadata for column names and dimension.

  • Version 0.6.2 (2024-05-15):
  • Version 0.1.9 (2024-03-19):
    • New method.

execute_statement(sql_stmt: str, sql_params: tuple = None, update_row_meta: bool = True) None[source]

Executes an arbitrary SQL query against the current model without the expectation of selection or returned rows.

Parameters:
  • sql_stmt (str) – The SQL query to execute.

  • sql_params (tuple, optional) – The SQL parameters to provide for parameterized queries.

  • update_row_meta (bool, optional) – Whether the row count metadata should be updated after executing the statement. Default is True, using SQLDataModel._update_model_metadata() to ensure any schema modifications remain in sync.

Raises:

SQLProgrammingError – If the SQL execution fails.

Returns:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('data.csv')

# Execute statement without results, modifying column in place
df.execute_statement('UPDATE table SET column = value WHERE condition')

# Execute a parameterized with statement by providing values
df.execute_statement('DELETE FROM table WHERE idx = ? or name = ?', (7,'Bob'))

Note

Changelog:
  • Version 0.8.0 (2024-06-21):
    • Added update_row_meta parameter to speed up transactions that are guaranteed to have no effect on the current model SQLDataModel.indicies metadata. A shallower and computationally cheaper check will still occur to ensure SQLDataModel.header_master remains in sync.

  • Version 0.7.4 (2024-06-13):
    • Added sql_params parameter to allow parameterized statements similar to other SQL execution methods.

  • Version 0.1.9 (2024-03-19):
    • New method.

execute_transaction(sql_script: str, update_row_meta: bool = True) None[source]

Executes a prepared SQL script wrapped in a transaction against the current model without the expectation of selection or returned rows.

Parameters:
  • sql_script (str) – The SQL script to execute within a transaction.

  • update_row_meta (bool, optional) – Whether the row count metadata should be updated after executing the transaction. Default is True, using SQLDataModel._update_model_metadata() to ensure any schema modifications remain in sync.

Raises:

SQLProgrammingError – If the provided sql_script cannot be executed or the SQL execution fails.

Returns:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('data.csv')

# Script to update columns with predicate
transaction_script = '''
    UPDATE table1 SET column1 = value1 WHERE condition1;
    UPDATE table2 SET column2 = value2 WHERE condition2;
'''

# Execute the script
df.execute_transaction(transaction_script)

Note

Changelog:
  • Version 0.8.0 (2024-06-21):
    • Added update_row_meta parameter to speed up transactions that are guaranteed to have no effect on the current model SQLDataModel.indicies metadata. A shallower and computationally cheaper check will still occur to ensure SQLDataModel.header_master remains in sync.

  • Version 0.1.9 (2024-03-19):
    • New method.

fillna(value, strictly_null: bool = False, inplace: bool = True) None | SQLDataModel[source]

Fills missing (na or nan) values in the current SQLDataModel with the provided value inplace or as a new instance.

Parameters:
  • value – The scalar value to fill missing values with. Should be of type ‘str’, ‘int’, ‘float’, ‘bytes’, or ‘bool’.

  • inplace (bool) – If True, modifies the current instance in-place. If False, returns a new instance with missing values filled.

  • strictly_null (bool) – If True, only strictly null values are filled. If False, values like 'NA', 'NaN', 'n/a', 'na', and whitespace only strings are also filled.

Raises:

TypeError – If value is not a scalar type or is incompatible with SQLite’s type system.

Returns:

When inplace=True modifies model inplace, returning None, when inplace=False a new SQLDataModel is returned.

Return type:

None or SQLDataModel

Example:

import sqldatamodel as sdm

# Create sample data
data = [('Alice', 25, None), ('Bob', None, 'N/A'), ('Charlie', 'NaN', ' '), ('David', 30, 'NA')]

# Create the model
df = SQLDataModel(data, headers=['Name', 'Age', 'Status'])

# Fill missing values with 0
df_filled = df.fillna(value=0, strictly_null=False, inplace=False)

# View filled model
print(df_filled)

This will output:

┌───┬─────────┬──────┬────────┐
│   │ Name    │  Age │ Status │
├───┼─────────┼──────┼────────┤
│ 0 │ Alice   │   25 │      0 │
│ 1 │ Bob     │    0 │      0 │
│ 2 │ Charlie │    0 │      0 │
│ 3 │ David   │   30 │      0 │
└───┴─────────┴──────┴────────┘
[4 rows x 3 columns]

Note

  • The method supports filling missing values with various scalar types which are then adapted to the columns set dtype.

  • The strictly_null parameter controls whether additional values like ('NA', 'NAN', 'n/a', 'na', '') with last being an empty string, are treated as null.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

freeze_index(column_name: str = None) None[source]

Freeze the current index as a new column, expanding it into the current model. The new column is unaffected by any future changes to the primary index column.

Parameters:

column_name (str, optional) – The name for the new frozen index column. If not provided, a default name ‘frzn_id’ will be used.

Raises:

TypeError – If the provided column_name is not of type ‘str’.

Returns:

None

Example:

import sqldatamodel as sdm

headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Freeze index as new column 'id'
df.freeze_index("id")

# View model
print(df)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┬──────┐
│   │ first │ last    │  age │ service │ hire_date  │   id │
├───┼───────┼─────────┼──────┼─────────┼────────────┼──────┤
│ 0 │ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │    0 │
│ 1 │ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │    1 │
│ 2 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │    2 │
│ 3 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │    3 │
│ 4 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │    4 │
└───┴───────┴─────────┴──────┴─────────┴────────────┴──────┘
[5 rows x 6 columns]

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_csv(csv_source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiter: str = ',', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel generated from the provided CSV source, which can be either a file path or a raw delimited string.

Parameters:
  • csv_source (str) – The path to the CSV file or a raw delimited string.

  • infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.

  • encoding (str, optional) – The encoding used to decode the CSV source if it is a file. Default is ‘Latin1’.

  • delimiter (str, optional) – The delimiter to use when parsing CSV source. Default is ,.

  • quotechar (str, optional) – The character used for quoting fields. Default is ".

  • headers (List[str], optional) – List of column headers. If None, the first row of the CSV source is assumed to contain headers.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided CSV source.

Return type:

SQLDataModel

Raises:
  • ValueError – If no delimited data is found in csv_source or if parsing with delimiter does not yield valid tabular data.

  • Exception – If an error occurs while attempting to read from or process the provided CSV source.

Examples:

From CSV File
import sqldatamodel as sdm

# CSV file path or raw CSV string
csv_source = "/path/to/data.csv"

# Create the model using the CSV file, providing custom headers
df = sdm.from_csv(csv_source, headers=['ID', 'Name', 'Value'])
From CSV Literal
import sqldatamodel as sdm

# CSV data
data = '''
A, B, C
1a, 1b, 1c
2a, 2b, 2c
3a, 3b, 3c
'''

# Create the model
df = sdm.from_csv(data)

# View result
print(df)

This will output:

┌──────┬──────┬──────┐
│ A    │ B    │ C    │
├──────┼──────┼──────┤
│ 1a   │ 1b   │ 1c   │
│ 2a   │ 2b   │ 2c   │
│ 3a   │ 3b   │ 3c   │
└──────┴──────┴──────┘
[3 rows x 3 columns]

Note

  • If csv_source is delimited by characters other than those specified, use SQLDataModel.from_delimited() and provide delimiter to delimiters.

  • If headers are provided, the first row parsed from source will be the first row in the table and not discarded.

  • The infer_types argument can be used to infer the appropriate data type for each column:

    • When infer_types = True, a random subset of the data will be used to infer the correct type and cast values accordingly

    • When infer_types = False, values from the first row only will be used to assign types, almost always ‘str’ when reading from CSV.

Changelog:
  • Version 0.4.0 (2024-04-23):
    • Modifed to only parse CSV files and removed all delimiter sniffing with introduction of new method SQLDataModel.from_delimited() to handle other delimiters.

    • Renamed delimiters parameter to delimiter with , set as new default to reflect revised focus on CSV files only.

classmethod from_data(data: Any = None, **kwargs) SQLDataModel[source]

Convenience method to infer the source of data and return the appropriate constructor method to generate a new SQLDataModel instance.

Parameters:
  • data (Any, required) – The input data from which to create the SQLDataModel object.

  • **kwargs – Additional keyword arguments to be passed to the constructor method, see init method for arguments.

Constructor methods are called according to the input type:
Returns:

The SQLDataModel object created from the provided data.

Return type:

SQLDataModel

Raises:
  • TypeError – If the type of data is not supported.

  • ValueError – If the file extension is not found, unsupported, or if the SQL extension is not supported.

  • Exception – If an OS related error occurs during file read operations if data is a filepath.

Example:

import sqldatamodel as sdm

# Create SQLDataModel from a CSV file
df_csv = sdm.from_data("data.csv", headers=['ID', 'Name', 'Value'])

# Create SQLDataModel from a dictionary
df_dict = sdm.from_data({"ID": int, "Name": str, "Value": float})

# Create SQLDataModel from a list of tuples
df_list = sdm.from_data([(1, 'Alice', 100.0), (2, 'Bob', 200.0)], headers=['ID', 'Name', 'Value'])

# Create SQLDataModel from raw string literal
delimited_literal = '''
A, B, C
1, 2, 3
4, 5, 6
7, 8, 9
'''

# Create the model by having correct constructor inferred
df = sdm.from_data(delimited_literal)

# View output
print(df)

This will output:

┌────┬────┬────┐
│ A  │ B  │ C  │
├────┼────┼────┤
│ 1  │ 2  │ 3  │
│ 4  │ 5  │ 6  │
│ 7  │ 8  │ 9  │
└────┴────┴────┘
[3 rows x 3 columns]

Note

  • This method attempts to infer the correct method to call based on data argument, if one cannot be inferred an exception is raised.

  • For data type specific implementation or examples, see related method for appropriate data type.

classmethod from_delimited(source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiters: str = ', \t;|:', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel generated from the provided delimited source, which can be either a file path or a raw delimited string.

Parameters:
  • source (str) – The path to the delimited file or a raw delimited string.

  • infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.

  • encoding (str, optional) – The encoding used to decode the source if it is a file. Default is 'Latin1'.

  • delimiters (str, optional) – Possible delimiters. Default is \s, \t, ;, |, : or , (space, tab, semicolon, pipe, colon or comma).

  • quotechar (str, optional) – The character used for quoting fields. Default is ".

  • headers (list[str], optional) – List of column headers. If None, the first row of the delimited source is assumed to be the header row.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided CSV source.

Return type:

SQLDataModel

Raises:
  • ValueError – If no delimiter is found in source or if parsing with delimiter does not yield valid tabular data.

  • Exception – If an error occurs while attempting to read from or process the provided CSV source.

Example:

From Delimited Literal
import sqldatamodel as sdm

# Space delimited literal
source_data = '''
Name Age Height
Beth 27 172.4
Kate 28 162.0
John 30 175.3
Will 35 185.8'''

# Create the model
df = sdm.from_delimited(source_data)

# View output
print(df)

This will output:

┌──────┬─────┬─────────┐
│ Name │ Age │  Height │
├──────┼─────┼─────────┤
│ Beth │  27 │  172.40 │
│ Kate │  28 │  162.00 │
│ John │  30 │  175.30 │
│ Will │  35 │  185.80 │
└──────┴─────┴─────────┘
[4 rows x 3 columns]
From Delimited File
import sqldatamodel as sdm

# Tab separated file
tsv_file = 'persons.tsv'

# Create the model
df = sdm.from_delimited(tsv_file)

Note

  • Use SQLDataModel.from_csv() if delimiter in source is already known and available as this method requires more compute to determine a plausible delimiter.

  • Use SQLDataModel.from_text() if data is not delimited but is a string representation such as an ASCII table or the output from another SQLDataModel instance.

  • If file is delimited by delimiters other than the default targets \s, \t, ;, |, : or , (space, tab, semicolon, pipe, colon or comma) make sure they are provided as single character values to delimiters.

Changelog:
  • Version 0.4.0 (2024-04-23):
    • New method.

classmethod from_dict(data: dict | list, **kwargs) SQLDataModel[source]

Create a new SQLDataModel instance from the provided dictionary.

Parameters:
  • data (dict) – The dictionary or list of dictionaries to convert to SQLDataModel. If keys are of type int, they will be used as row indexes; otherwise, keys will be used as headers.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided dictionary.

Return type:

SQLDataModel

Raises:
  • TypeError – If the provided dictionary values are not of type ‘list’, ‘tuple’, or ‘dict’.

  • ValueError – If the provided data appears to be a list of dicts but is empty.

Example:

import sqldatamodel as sdm

# Sample data with column orientation
data = {
    'Name': ['Beth', 'John', 'Alice', 'Travis'],
    'Height': [172.4, 175.3, 162.0, 185.8],
    'Age': [27, 30, 28, 35]
}

# Create the model
df = sdm.from_dict(data)

# View it
print(df)

This will output:

┌────────┬─────────┬─────┐
│ Name   │  Height │ Age │
├────────┼─────────┼─────┤
│ Beth   │  172.40 │  27 │
│ John   │  175.30 │  30 │
│ Alice  │  162.00 │  28 │
│ Travis │  185.80 │  35 │
└────────┴─────────┴─────┘
[4 rows x 3 columns]

We can also create a model using a dictionary with row orientation:

import sqldatamodel as sdm

# Sample data with row orientation
data = {
     0: ['Mercury', 0.38]
    ,1: ['Venus', 0.91]
    ,2: ['Earth', 1.00]
    ,3: ['Mars', 0.38]
}

# Create the model with custom headers
df = sdm.from_dict(data, headers=['Planet', 'Gravity'])

# View output
print(df)

This will output the model created using row-wise dictionary data:

┌─────────┬─────────┐
│ Planet  │ Gravity │
├─────────┼─────────┤
│ Mercury │    0.38 │
│ Venus   │    0.91 │
│ Earth   │    1.00 │
│ Mars    │    0.38 │
└─────────┴─────────┘
[4 rows x 2 columns]

Note

  • If data orientation suggests JSON like structure, then SQLDataModel.from_json() will attempt to construct the model.

  • Dictionaries in list like orientation can also be used with structures similar to JSON objects.

  • The method determines the structure of the SQLDataModel based on the format of the provided dictionary.

  • If the keys are integers, they are used as row indexes; otherwise, keys are used as headers.

  • See SQLDataModel.to_dict() for converting existing instances of SQLDataModel to dictionaries.

Changelog:
  • Version 0.6.3 (2024-05-16):
    • Modified to try parsing input data as JSON if initial inspection does not signify row or column orientation.

  • Version 0.1.5 (2023-11-24):
    • New method.

classmethod from_excel(filename: str, worksheet: int | str = 0, min_row: int | None = None, max_row: int | None = None, min_col: int | None = None, max_col: int | None = None, headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel instance from the specified Excel file.

Parameters:
  • filename (str) – The file path to the Excel file, e.g., filename = 'titanic.xlsx'.

  • worksheet (int | str, optional) – The index or name of the worksheet to read from. Defaults to 0, indicating the first worksheet.

  • min_row (int | None, optional) – The minimum row number to start reading data from. Defaults to None, indicating the first row.

  • max_row (int | None, optional) – Maximum row index (1-based) to import. Defaults to None, indicating all rows are read.

  • min_col (int | None, optional) – Minimum column index (1-based) to import. Defaults to None, indicating the first column.

  • max_col (int | None, optional) – Maximum column index (1-based) to import. Defaults to None, indicating all the columns are read.

  • headers (List[str], optional) – The column headers for the data. Default is None, using the first row of the Excel sheet as headers.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Raises:
  • ModuleNotFoundError – If the required package openpyxl is not installed as determined by optionals._has_xl flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid Excel file path.

  • Exception – If an error occurs during Excel read and write operations related to openpyxl processing.

Returns:

A new instance of SQLDataModel created from the Excel file.

Return type:

SQLDataModel

Examples:

We’ll use this Excel file, data.xlsx, as the source for the below examples:

    ┌───────┬─────┬────────┬───────────┐
    │ A     │ B   │ C      │ D         │
┌───┼───────┼─────┼────────┼───────────┤
│ 1 │ Name  │ Age │ Gender │ City      │
│ 2 │ John  │ 25  │ Male   │ Boston    │
│ 3 │ Alice │ 30  │ Female │ Milwaukee │
│ 4 │ Bob   │ 22  │ Male   │ Chicago   │
│ 5 │ Sarah │ 35  │ Female │ Houston   │
│ 6 │ Mike  │ 28  │ Male   │ Atlanta   │
└───┴───────┴─────┴────────┴───────────┘
[ Sheet1 ]

Example 1: Load Excel file with default parameters

import sqldatamodel as sdm

# Create the model using the default parameters
df = sdm.from_excel('data.xlsx')

# View imported data
print(df)

This will output all of the data starting from ‘A1’:

┌───────┬──────┬────────┬───────────┐
│ Name  │  Age │ Gender │ City      │
├───────┼──────┼────────┼───────────┤
│ John  │   25 │ Male   │ Boston    │
│ Alice │   30 │ Female │ Milwaukee │
│ Bob   │   22 │ Male   │ Chicago   │
│ Sarah │   35 │ Female │ Houston   │
│ Mike  │   28 │ Male   │ Atlanta   │
└───────┴──────┴────────┴───────────┘
[5 rows x 4 columns]

Example 2: Load Excel file from specific worksheet

import sqldatamodel as sdm

# Create the model from 'Sheet2'
df = sdm.from_excel('data.xlsx', worksheet='Sheet2')

# View imported data
print(df)

This will output the contents of ‘Sheet2’:

┌────────┬───────┐
│ Gender │ count │
├────────┼───────┤
│ Male   │     3 │
│ Female │     2 │
└────────┴───────┘
[2 rows x 2 columns]

Example 3: Load Excel file with custom headers starting from different row

import sqldatamodel as sdm

# Use our own headers instead of the Excel ones
new_cols = ['Col A', 'Col B', 'Col C', 'Col D']

# Create the model starting from the 2nd row to ignore the original headers
df = sdm.from_excel('data.xlsx', min_row=2, headers=new_cols)

# View the data
print(df)

This will output the data with our renamed headers:

┌───────┬───────┬────────┬───────────┐
│ Col A │ Col B │ Col C  │ Col D     │
├───────┼───────┼────────┼───────────┤
│ John  │    25 │ Male   │ Boston    │
│ Alice │    30 │ Female │ Milwaukee │
│ Bob   │    22 │ Male   │ Chicago   │
│ Sarah │    35 │ Female │ Houston   │
│ Mike  │    28 │ Male   │ Atlanta   │
└───────┴───────┴────────┴───────────┘
[5 rows x 4 columns]

Example 4: Load Excel file with specific subset of columns

import sqldatamodel as sdm

# Create the model using the middle two columns only
df = sdm.from_excel('data.xlsx', min_col=2, max_col=3)

# View the data
print(df)

This will output only the middle two columns:

┌──────┬────────┐
│  Age │ Gender │
├──────┼────────┤
│   25 │ Male   │
│   30 │ Female │
│   22 │ Male   │
│   35 │ Female │
│   28 │ Male   │
└──────┴────────┘
[5 rows x 2 columns]

Note

  • This method entirely relies on openpyxl, see their amazing documentation for further information on Excel file handling in python.

  • If custom headers are provided using the default min_row, then the original headers, if present, will be duplicated.

  • All indicies for min_row, max_row, min_col and max_col are 1-based instead of 0-based, again see openpyxl for more details.

  • See related SQLDataModel.to_excel() for exporting an existing SQLDataModel to Excel.

Changelog:
  • Version 0.2.2 (2024-03-26):
    • New method.

classmethod from_html(html_source: str, encoding: str = 'utf-8', table_identifier: int | str = 1, infer_types: bool = True, **kwargs) SQLDataModel[source]

Parses HTML table element from one of three possible sources: web page at url, local file at path, raw HTML string literal. If table_identifier is not specified, the first <table> element successfully parsed is returned, otherwise if table_identifier is a str, the parser will return the corresponding ‘id’ or ‘name’ HTML attribute that matches the identifier specified. If table_identifier is an int, the parser will return the table matched as a sequential index after parsing all <table> elements from the top of the page down, starting at ‘1’, the first table found. By default, the first <table> element found is returned if table_identifier is not specified.

Parameters:
  • html_source (str) – The HTML source, which can be a URL, a valid path to an HTML file, or a raw HTML string. If starts with ‘http’, the argument is considered a url and the table will be parsed from returned the web request. If is a valid file path, the argument is considered a local file and the table will be parsed from its html. If is not a valid url or path, the argument is considered a raw HTML string and the table will be parsed directly from the input.

  • encoding (str) – The encoding to use for reading HTML when html_source is considered a valid url or file path (default is ‘utf-8’).

  • table_identifier (int | str) – An identifier to specify which table to parse if there are multiple tables in the HTML source. Default is 1, returning the first table element found.

  • infer_types (bool, optional) – If column data types should be inferred in the return model. Default is True, meaning column types will be inferred otherwise are returned as ‘str’ types. If is int, identifier is treated as the indexed location of the <table> element on the page from top to bottom starting from zero and will return the corresponding position when encountered. If is str, identifier is treated as a target HTML ‘id’ or ‘name’ attribute to search for and will return the first case-insensitive match when encountered.

  • **kwargs – Additional keyword arguments to pass when using urllib.request.urlopen to fetch HTML from a URL.

Returns:

A new SQLDataModel instance containing the data from the parsed HTML table.

Return type:

SQLDataModel

Raises:
  • TypeError – If html_source is not of type str representing a possible url, filepath or raw HTML stream.

  • HTTPError – Raised from urllib when html_source is considered a url and an HTTP exception occurs.

  • URLError – Raised from urllib when html_source is considered a url and a URL exception occurs.

  • ValueError – If no <table> elements are found or if the targeted table_identifier is not found.

  • OSError – Related exceptions that may be raised when html_source is considered a file path.

Examples:

From Website URL
import sqldatamodel as sdm

# From URL
url = 'https://en.wikipedia.org/wiki/1998_FIFA_World_Cup'

# Lets get the 95th table from the 1998 World Cup
df = sdm.from_html(url, table_identifier=95)

# View result:
print(df)

This will output:

┌────┬─────────────┬────┬────┬────┬────┬────┬────┬────┬─────┬──────┐
│  R │ Team        │ G  │  P │  W │  D │  L │ GF │ GA │ GD  │ Pts. │
├────┼─────────────┼────┼────┼────┼────┼────┼────┼────┼─────┼──────┤
│  1 │ France      │ C  │  7 │  6 │  1 │  0 │ 15 │  2 │ +13 │   19 │
│  2 │ Brazil      │ A  │  7 │  4 │  1 │  2 │ 14 │ 10 │ +4  │   13 │
│  3 │ Croatia     │ H  │  7 │  5 │  0 │  2 │ 11 │  5 │ +6  │   15 │
│  4 │ Netherlands │ E  │  7 │  3 │  3 │  1 │ 13 │  7 │ +6  │   12 │
│  5 │ Italy       │ B  │  5 │  3 │  2 │  0 │  8 │  3 │ +5  │   11 │
│  6 │ Argentina   │ H  │  5 │  3 │  1 │  1 │ 10 │  4 │ +6  │   10 │
│  7 │ Germany     │ F  │  5 │  3 │  1 │  1 │  8 │  6 │ +2  │   10 │
│  8 │ Denmark     │ C  │  5 │  2 │  1 │  2 │  9 │  7 │ +2  │    7 │
└────┴─────────────┴────┴────┴────┴────┴────┴────┴────┴─────┴──────┘
[8 rows x 11 columns]
From Local File
import sqldatamodel as sdm

# From HTML file
df = sdm.from_html('path/to/file.html')

# View output
print(df)

This will output:

┌─────────────┬────────┬──────┐
│ Team        │ Points │ Rank │
├─────────────┼────────┼──────┤
│ Brazil      │ 63.7   │ 1    │
│ England     │ 50.7   │ 2    │
│ Spain       │ 50.0   │ 3    │
│ Germany [a] │ 49.3   │ 4    │
│ Mexico      │ 47.3   │ 5    │
│ France      │ 46.0   │ 6    │
│ Italy       │ 44.3   │ 7    │
│ Argentina   │ 44.0   │ 8    │
└─────────────┴────────┴──────┘
[8 rows x 3 columns]
From Raw HTML
import sqldatamodel as sdm

# Raw HTML
raw_html =
'''<table id="find-me">
    <tr>
        <th>Col 1</th>
        <th>Col 2</th>
    </tr>
    <tr>
        <td>A</td>
        <td>1</td>
    </tr>
    <tr>
        <td>B</td>
        <td>2</td>
    </tr>
    <tr>
        <td>C</td>
        <td>3</td>
    </tr>
</table>'''

# Create the model and search for id attribute
df = sdm.from_html(raw_html, table_identifier="find-me")

# View output
print(df)

This will output:

┌───┬───────┬───────┐
│   │ Col 1 │ Col 2 │
├───┼───────┼───────┤
│ 1 │ B     │ 2     │
│ 2 │ C     │ 3     │
└───┴───────┴───────┘
[3 rows x 2 columns]

Note

  • **kwargs passed to method are used in urllib.request.urlopen if html_source is being considered as a web url.

  • **kwargs passed to method are used in open if html_source is being considered as a filepath.

  • The largest row size encountered will be used as the column_count for the returned SQLDataModel, rows will be padded with None if less.

  • See utils.generate_html_table_chunks() for initial source chunking before content fed to SQLDataModel.HTMLParser.

Changelog:
  • Version 0.9.0 (2024-06-26):
    • Modified table_identifier default value to 1, changing from zero-based to one-based indexing for referencing target table in source to align with similar extraction methods throughout package.

  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_json(json_source: str | list | dict, encoding: str = 'utf-8', **kwargs) SQLDataModel[source]

Creates a new SQLDataModel instance from JSON file path or JSON-like source, flattening if required.

Parameters:
  • json_source (str | list | dict) – The JSON source. If a string, it can represent a file path or a JSON-like object.

  • encoding (str) – The encoding to use when reading from a file. Defaults to ‘utf-8’.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Returns:

A new SQLDataModel instance created from the JSON source.

Return type:

SQLDataModel

Raises:
  • TypeError – If the json_source argument is not of type ‘str’, ‘list’, or ‘dict’.

  • OSError – If related exception occurs when trying to open and read from json_source as file path.

Examples:

From JSON String Literal
import sqldatamodel as sdm

# Sample JSON string
json_data = '''[{
    "id": 1,
    "color": "red",
    "value": "#f00"
    },
    {
    "id": 2,
    "color": "green",
    "value": "#0f0"
    },
    {
    "id": 3,
    "color": "blue",
    "value": "#00f"
}]'''

# Create the model
df = sdm.from_json(json_data)

# View result
print(df)

This will output:

┌──────┬───────┬───────┐
│   id │ color │ value │
├──────┼───────┼───────┤
│    1 │ red   │ #f00  │
│    2 │ green │ #0f0  │
│    3 │ blue  │ #00f  │
└──────┴───────┴───────┘
[3 rows x 3 columns]
From JSON-like Object
import sqldatamodel as sdm

# JSON-like sample
json_data = [{
    "alpha": "A",
    "value": "1"
},
{
    "alpha": "B",
    "value": "2"
},
{
    "alpha": "C",
    "value": "3"
}]

# Create the model
df = sdm.from_json(json_data)

# Output
print(df)

This will output:

┌───────┬───────┐
│ alpha │ value │
├───────┼───────┤
│ A     │ 1     │
│ B     │ 2     │
│ C     │ 3     │
└───────┴───────┘
[3 rows x 2 columns]
From JSON file
import sqldatamodel as sdm

# JSON file path
json_data = 'data/json-sample.json'

# Create the model
df = sdm.from_json(json_data, encoding='latin-1')

# View output
print(df)

This will output:

┌──────┬────────┬───────┬─────────┐
│   id │ color  │ value │ notes   │
├──────┼────────┼───────┼─────────┤
│    1 │ red    │ #f00  │ primary │
│    2 │ green  │ #0f0  │         │
│    3 │ blue   │ #00f  │ primary │
│    4 │ cyan   │ #0ff  │         │
│    5 │ yellow │ #ff0  │         │
│    5 │ black  │ #000  │         │
└──────┴────────┴───────┴─────────┘
[6 rows x 4 columns]

Note

  • If json_source is deeply-nested it will be flattened according to the staticmethod utils.flatten_json()

  • If json_source is a JSON-like string object that is not an array, it will be wrapped according as an array.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_latex(latex_source: str, table_identifier: int = 1, encoding: str = 'utf-8', **kwargs) SQLDataModel[source]

Creates a new SQLDataModel instance from the provided LaTeX file or raw literal.

Parameters:
  • latex_source (str) – The LaTeX source containing one or more LaTeX tables. If latex_source is a valid system filepath, source will be treated as a .tex file and parsed. If latex_source is not a valid filepath, source will be parsed as raw LaTeX literal.

  • table_identifier (int, optional) – The index position of the LaTeX table to extract. Default is 1.

  • encoding (str, optional) – The file encoding to use if source is a LaTex filepath. Default is ‘utf-8’;.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel instance created from the parsed LaTeX table.

Return type:

SQLDataModel

Raises:
  • TypeError – If the latex_source argument is not of type ‘str’, or if the table_identifier argument is not of type ‘int’.

  • ValueError – If the table_identifier argument is less than 1, or if no tables are found in the LaTeX source.

  • IndexError – If the table_identifier is greater than the number of tables found in the LaTeX source.

Table Indicies:
  • In the last example, sdm will contain the data from the second table found in the LaTeX content.

  • Tables are indexed starting from index 1 at the top of the LaTeX content, incremented as they are found.

  • LaTeX parsing stops after the table specified at table_identifier is found without parsing the remaining content.

Examples:

From LaTeX literal
import sqldatamodel as sdm

# Raw LaTeX literal
latex_content = '''
\begin{tabular}{|l|r|r|}
\hline
    {Name} & {Age} & {Height} \\
\hline
    John    &   30 &  175.30 \\
    Alice   &   28 &  162.00 \\
    Michael &   35 &  185.80 \\
\hline
\end{tabular}
'''

# Create the model from the LaTeX
df = sdm.from_latex(latex_content)

# View result
print(df)

This will output:

┌─────────┬──────┬─────────┐
│ Name    │  Age │  Height │
├─────────┼──────┼─────────┤
│ John    │   30 │  175.30 │
│ Alice   │   28 │  162.00 │
│ Michael │   35 │  185.80 │
└─────────┴──────┴─────────┘
[3 rows x 3 columns]
From LaTeX file
import sqldatamodel as sdm

# Load LaTeX content from file
latex_file = 'path/to/latex/file.tex'

# Create the model using the path
df = sdm.from_latex(latex_file)
Specifying table identifier
import sqldatamodel as sdm

# Raw LaTeX literal with multiple tables
latex_content = '''
%% LaTeX with a Table

\begin{tabular}{|l|l|}
\hline
    {Header A} & {Header B} \\
\hline
    Value A1 & Value B1 \\
    Value A2 & Value B2 \\
\hline
\end{tabular}

%% Then another Table

\begin{tabular}{|l|l|}
\hline
    {Header X} & {Header Y} \\
\hline
    Value X1 & Value Y1 \\
    Value X2 & Value Y2 \\
\hline
\end{tabular}
'''

# Create the model from the 2nd table
df = sdm.from_latex(latex_content, table_identifier=2)

# View output
print(df)

This will output:

┌──────────┬──────────┐
│ Header X │ Header Y │
├──────────┼──────────┤
│ Value X1 │ Value Y1 │
│ Value X2 │ Value Y2 │
└──────────┴──────────┘
[2 rows x 2 columns]

Note

  • LaTeX tables are identified based on the presence of tabular environments: \begin{tabular}...\end{tabular}.

  • The table_identifier specifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.

  • The provided kwargs are passed to the SQLDataModel constructor for additional parameters to the instance returned.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_markdown(markdown_source: str, table_identifier: int = 1, **kwargs) SQLDataModel[source]

Creates a new SQLDataModel instance from the provided Markdown source file or raw content.

If markdown_source is a valid system path, the markdown file will be parsed. Otherwise, the provided string will be parsed as raw markdown.

Parameters:
  • markdown_source (str) – The Markdown source file path or raw content.

  • table_identifier (int, optional) – The index position of the markdown table to extract. Default is 1.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Raises:
  • TypeError – If the markdown_source argument is not of type ‘str’, or if the table_identifier argument is not of type ‘int’.

  • ValueError – If the table_identifier argument is less than 1, or if no tables are found in the markdown source.

  • IndexError – If the table_identifier is greater than the number of tables found in the markdown source.

Returns:

The SQLDataModel instance created from the parsed markdown table.

Return type:

SQLDataModel

Table indicies:
  • In the last example, sdm will contain the data from the second table found in the markdown content.

  • Tables are indexed starting from index 1 at the top of the markdown content, incremented as they are found.

  • Markdown parsing stops after the table specified at table_identifier is found without parsing the remaining content.

Examples:

From Markdown Literal
import sqldatamodel as sdm

# Raw markdown literal
markdown_content = '''
| Item          | Price | # In stock |
|---------------|-------|------------|
| Juicy Apples  | 1.99  | 37         |
| Bananas       | 1.29  | 52         |
| Pineapple     | 3.15  | 14         |
'''

# Create the model from the markdown
df = sdm.from_markdown(markdown_content)

# View result
print(df)

This will output:

┌──────────────┬───────┬────────────┐
│ Item         │ Price │ # In stock │
├──────────────┼───────┼────────────┤
│ Juicy Apples │ 1.99  │ 37         │
│ Bananas      │ 1.29  │ 52         │
│ Pineapple    │ 3.15  │ 14         │
└──────────────┴───────┴────────────┘
[3 rows x 3 columns]
From Markdown File
import sqldatamodel as sdm

# Load markdown content from file
markdown_file_path = 'path/to/markdown_file.md'

# Create the model using the path
df = sdm.from_markdown(markdown_file_path)
Specifying Table Identifier
import sqldatamodel as sdm

# Raw markdown literal with multiple tables
markdown_content = '''
### Markdown with a Table

| Header A | Header B |
|----------|----------|
| Value A1 | Value B1 |
| Value A2 | Value B2 |

### Then another Table

| Header X | Header Y |
|----------|----------|
| Value X1 | Value Y1 |
| Value X2 | Value Y2 |

'''
# Create the model from the 2nd table
df = sdm.from_markdown(markdown_content, table_identifier=2)

# View output
print(df)

This will output:

┌──────────┬──────────┐
│ Header X │ Header Y │
├──────────┼──────────┤
│ Value X1 │ Value Y1 │
│ Value X2 │ Value Y2 │
└──────────┴──────────┘
[2 rows x 2 columns]

Note

  • Markdown tables are identified based on the presence of pipe characters | defining table cells.

  • The table_identifier specifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.

  • Escaped pipe characters \| within the markdown are replaced with the HTML entity reference &vert; for proper parsing.

  • The provided kwargs are passed to the SQLDataModel constructor for additional parameters to the instance returned.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_numpy(array, headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a SQLDataModel object created from the provided numpy array.

Parameters:
  • array (numpy.ndarray) – The numpy array to convert to a SQLDataModel.

  • headers (list of str, optional) – The list of headers to use for the SQLDataModel. If None, no headers will be used, and the data will be treated as an n-dimensional array. Default is None.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the numpy array.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package numpy is not found.

  • TypeError – If array argument is not of type numpy.ndarray.

  • DimensionError – If array.ndim != 2 representing a (row, column) tabular array.

Example:

import numpy as np
import sqldatamodel as sdm

# Sample array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Create the model with custom headers
df = sdm.from_numpy(arr, headers=['Col A', 'Col B', 'Col C])

# View output
print(df)

This will output:

┌───────┬───────┬───────┐
│ Col A │ Col B │ Col C │
├───────┼───────┼───────┤
│     1 │     2 │     3 │
│     4 │     5 │     6 │
│     7 │     8 │     9 │
└───────┴───────┴───────┘
[3 rows x 3 columns]

Note

  • Numpy array must have ‘2’ dimensions, the first representing the rows, and the second the columns.

  • If no headers are provided, default headers will be generated as ‘col_N’ where N represents the column integer index.

Changelog:
  • Version 0.1.3 (2023-10-15):
    • New method.

classmethod from_pandas(df, headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a SQLDataModel object created from the provided df representing a Pandas DataFrame object. Note that pandas must be installed in order to use this method.

Parameters:
  • df (pandas.DataFrame) – The pandas DataFrame to convert to a SQLDataModel.

  • headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the df object.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the pandas DataFrame.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package pandas is not found.

  • TypeError – If df argument is not of type pandas.DataFrame.

Example:

import pandas as pd
import sqldatamodel as sdm

# Create a pandas DataFrame
df_pd = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})

# Create the model
df_sdm = sdm.from_pandas(df_pd)

Note

  • If headers are not provided, the existing pandas columns will be used as the new SQLDataModel headers.

Changelog:
  • Version 0.1.3 (2023-10-15):
    • New method.

classmethod from_parquet(filename: str, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel instance from the specified parquet file.

Parameters:
  • filename (str) – The file path to the parquet file, e.g., filename = 'user/data/titanic.parquet'.

  • **kwargs – Additional keyword arguments to pass to the pyarrow read_table function, e.g., filters = [('Name','=','Alice')].

Returns:

A new instance of SQLDataModel created from the parquet file.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed as determined by optionals._has_pa flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid parquet file path.

  • FileNotFoundError – If the specified parquet filename is not found.

  • Exception – If any unexpected exception occurs during the file or parquet reading process.

Example:

import sqldatamodel as sdm

# Sample parquet file
pq_file = "titanic.parquet"

# Create the model
df = sdm.from_parquet(pq_file)

# View column counts
print(df.count())

This will output:

┌────┬─────────────┬──────┬────────┬───────┬───────┐
│    │ column      │   na │ unique │ count │ total │
├────┼─────────────┼──────┼────────┼───────┼───────┤
│  0 │ PassengerId │    0 │    891 │   891 │   891 │
│  1 │ Survived    │    0 │      2 │   891 │   891 │
│  2 │ Pclass      │    0 │      3 │   891 │   891 │
│  3 │ Name        │    0 │    891 │   891 │   891 │
│  4 │ Sex         │    0 │      2 │   891 │   891 │
│  5 │ Age         │  177 │     88 │   714 │   891 │
│  6 │ SibSp       │    0 │      7 │   891 │   891 │
│  7 │ Parch       │    0 │      7 │   891 │   891 │
│  8 │ Ticket      │    0 │    681 │   891 │   891 │
│  9 │ Fare        │    0 │    248 │   891 │   891 │
│ 10 │ Cabin       │  687 │    147 │   204 │   891 │
│ 11 │ Embarked    │    2 │      3 │   889 │   891 │
└────┴─────────────┴──────┴────────┴───────┴───────┘
[12 rows x 5 columns]

Note

classmethod from_pickle(filename: str = None, **kwargs) SQLDataModel[source]

Returns the SQLDataModel object from the provided filename. If None, the current directory will be scanned for the default SQLDataModel.to_pickle() format.

Parameters:
  • filename (str, optional) – The name of the pickle file to load. If None, the current directory will be scanned for the default filename. Default is None.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor, these will override the properties loaded from filename.

Returns:

The SQLDataModel object created from the loaded pickle file.

Return type:

SQLDataModel

Raises:
  • TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.

  • FileNotFoundError – If the provided filename could not be found or does not exist.

Example:

import sqldatamodel as sdm

headers = ['Name','Age','Sex']
data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')]

# Create the model with sample data
df = sdm.SQLDataModel(data=data, headers=headers)

# Filepath
pkl_file = 'people.sdm'

# Save the model
df.to_pickle(filename=pkl_file)

# Load it back from file
df = sdm.from_pickle(filename=pkl_file)

Note

  • All data, headers, data types and display properties will be saved when pickling.

  • Any additional kwargs provided will override those saved in the pickled model.

classmethod from_polars(df, headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a SQLDataModel object created from the provided df representing a Polars DataFrame object. Note that polars must be installed in order to use this method.

Parameters:
  • df (polars.DataFrame) – The Polars DataFrame to convert to a SQLDataModel.

  • headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the df object.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the Polars DataFrame.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package polars is not found.

  • TypeError – If df argument is not of type polars.DataFrame.

Example:

import polars as pl
import sqldatamodel as sdm

# Sample data
data = {
    'Name': ['Beth', 'John', 'Alice', 'Travis'],
    'Age': [27, 30, 28, 35],
    'Height': [172.4, 175.3, 162.0, 185.8]
}

# Create the polars DataFrame
df_pl = pl.DataFrame(data)

# Create a SQLDataModel object
df_sdm = sdm.from_polars(df_pl)

# View result
print(df_sdm)

This will output a SQLDataModel constructed from the Polars df_pl:

┌────────┬─────┬─────────┐
│ Name   │ Age │  Height │
├────────┼─────┼─────────┤
│ Beth   │  27 │  172.40 │
│ John   │  30 │  175.30 │
│ Alice  │  28 │  162.00 │
│ Travis │  35 │  185.80 │
└────────┴─────┴─────────┘
[4 rows x 3 columns]

Note

  • If headers are not provided, the columns from the provided DataFrame’s columns will be used as the new SQLDataModel headers.

  • Polars uses different data types than those used by SQLDataModel, see SQLDataModel.set_column_dtypes() for specific casting rules.

  • See related SQLDataModel.to_polars() for the inverse method of converting a SQLDataModel into a Polars DataFrame object.

Changelog:
  • Version 0.3.8 (2024-04-12):
    • New method.

classmethod from_pyarrow(table, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel instance from the provided Apache Arrow object.

Parameters:
  • table (pyarrow.lib.Table) – Apache Arrow object from which to construct a new SQLDataModel object.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed.

  • TypeError – If the provided table argument is not of type ‘pyarrow.lib.Table’.

Returns:

A new SQLDataModel instance representing the data in the provided Apache Arrow object.

Return type:

SQLDataModel

Example:

import pyarrow as pa
import sqldatamodel as sdm

# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Grade': [3.8, 3.9, 3.2],
}

# Create PyArrow table from data
table = pa.Table.from_pydict(data)

# Create model from PyArrow table
df = sdm.from_pyarrow(table)

This will output:

┌─────────┬──────┬───────┐
│ Name    │  Age │ Grade │
├─────────┼──────┼───────┤
│ Alice   │   25 │  3.80 │
│ Bob     │   30 │  3.90 │
│ Charlie │   35 │  3.20 │
└─────────┴──────┴───────┘
[3 rows x 3 columns]

Note

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.2.3 (2024-03-28):
    • New method.

classmethod from_shape(shape: tuple[int, int], fill: Any = None, headers: list[str] = None, dtype: Literal['bytes', 'date', 'datetime', 'float', 'int', 'str'] = None, **kwargs) SQLDataModel[source]

Returns a SQLDataModel from shape (N rows, M columns) as a convenience method to quickly build a model through an iterative approach. By default, no particular data type is assigned given the flexibility of sqlite3, however one can be inferred by providing an initial fill value or explicitly by providing the dtype argument.

Parameters:
  • shape (tuple[int, int]) – The shape to initialize the SQLDataModel with as (M, N) where M is the number of rows and N is the number of columns.

  • fill (Any, optional) – The scalar fill value to populate the new SQLDataModel with. Default is None, using SQL null values or deriving from dype if provided.

  • headers (list[str], optional) – The headers to use for the model. Default is None, incrementing headers 0, 1, ..., N where N is the number of columns.

  • dtype (str, optional) – A valid python or SQL datatype to initialize the n-dimensional model with. Default is None, using the SQL text type.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Raises:
  • TypeError – If M or N are not of type ‘int’ representing a valid shape to initialize a SQLDataModel with.

  • ValueError – If M or N are not positive integer values representing valid nonzero row and column dimensions.

  • ValueError – If dtype is not a valid python or SQL convertible datatype to initialize the model with.

Returns:

Instance with the specified number of rows and columns, initialized with by dtype fill values or with None values (default).

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create a 3x3 model filled by 'X'
df = sdm.from_shape((3,3), fill='X')

# View it
print(df)

This will output a 3x3 grid of ‘X’ characters:

┌───┬─────┬─────┬─────┐
│   │ 0   │ 1   │ 2   │
├───┼─────┼─────┼─────┤
│ 0 │ X   │ X   │ X   │
│ 1 │ X   │ X   │ X   │
│ 2 │ X   │ X   │ X   │
└───┴─────┴─────┴─────┘
[3 rows x 3 columns]

We can iteratively build the model from the shape dimensions:

import sqldatamodel as sdm

# Define shape
shape = (6,6)

# Initialize the multiplcation table with integer dtypes
mult_table = sdm.from_shape(shape=shape, dtype='int')

# Construct the table values
for x in range(shape[0]):
    for y in range(shape[1]):
        mult_table[x, y] = x * y

# View the multiplcation table
print(mult_table)

This will output our 6x6 multiplication table:

┌───┬─────┬─────┬─────┬─────┬─────┬─────┐
│   │   0 │   1 │   2 │   3 │   4 │   5 │
├───┼─────┼─────┼─────┼─────┼─────┼─────┤
│ 0 │   0 │   0 │   0 │   0 │   0 │   0 │
│ 1 │   0 │   1 │   2 │   3 │   4 │   5 │
│ 2 │   0 │   2 │   4 │   6 │   8 │  10 │
│ 3 │   0 │   3 │   6 │   9 │  12 │  15 │
│ 4 │   0 │   4 │   8 │  12 │  16 │  20 │
│ 5 │   0 │   5 │  10 │  15 │  20 │  25 │
└───┴─────┴─────┴─────┴─────┴─────┴─────┘
[6 rows x 6 columns]

Note

  • If both fill and dtype are provided, the data type will be derived from type(fill) overriding or ignoring the specified dtype.

  • If only dtype is provided, sensible default initialization fill values will be used to populate the model such as 0 or 0.0 for numeric and empty string or null for others.

  • For those data types not natively implemented by sqlite3 such as date and datetime, today’s date and now’s datetime will be used respectively for initialization values.

Changelog:
  • Version 0.5.2 (2024-05-13):
    • Added shape parameter in lieu of separate n_rows and n_cols arguments.

    • Added fill parameter to populate resulting SQLDataModel with values to override type-specific initialization defaults.

    • Added headers parameter to explicitly set column names when creating the SQLDataModel.

    • Added **kwargs parameter to align more closely with usage patterns of other model initializing constructor methods.

  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_sql(sql: str, con: Connection | Any, dtypes: dict = None, **kwargs) SQLDataModel[source]

Create a SQLDataModel object by executing the provided SQL query using the specified SQL connection. If a single word is provided as the sql, the method wraps it and executes a select all treating the text as the target table.

Supported Connection APIs:
  • SQLite using sqlite3 or url with format 'file:///path/to/database.db'

  • PostgreSQL using psycopg2 or url with format 'postgresql://user:pass@hostname:port/db'

  • SQL Server ODBC using pyodbc or url with format 'mssql://user:pass@hostname:port/db'

  • Oracle using cx_Oracle or url with format 'oracle://user:pass@hostname:port/db'

  • Teradata using teradatasql or url with format 'teradata://user:pass@hostname:port/db'

Parameters:
  • sql (str) – The SQL query to execute and use to create the SQLDataModel.

  • con (sqlite3.Connection | Any) – The database connection object or url, supported connection APIs are sqlite3, psycopg2, pyodbc, cx_Oracle, teradatasql.

  • dtypes (dict, optional) – A dictionary of the format 'column': 'python dtype' to assign to values. Default is None, mapping types from source connection.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the executed SQL query.

Return type:

SQLDataModel

Raises:
  • TypeError – If dtypes argument is provided and is not of type dict representing python data types to assign to values.

  • SQLProgrammingError – If the provided SQL connection is not opened or valid, or the SQL query is invalid or malformed.

  • ModuleNotFoundError – If con is provided as a connection url and the specified scheme driver module is not found.

  • DimensionError – If the provided SQL query returns no data.

Examples:

From SQL Table
import sqldatamodel as sdm

# Single word parameter
df = sdm.from_sql("table_name", sqlite3.Connection)

# Equilavent query executed
df = sdm.from_sql("select * from table_name", sqlite3.Connection)
From SQLite Database
import sqlite3
import sqldatamodel as sdm

# Create connection object
sqlite_db_conn = sqlite3.connect('./database/users.db')

# Basic usage with a select query
df = sdm.from_sql("SELECT * FROM my_table", sqlite_db_conn)

# When a single word is provided, it is treated as a table name for a select all query
df_table = df.from_sql("my_table", sqlite_db_conn)
From PostgreSQL Database
import psycopg2
import sqldatamodel as sdm

# Create connection object
pg_db_conn = psycopg2.connect('dbname=users user=postgres password=postgres')

# Basic usage with a select query
df = sdm.from_sql("SELECT * FROM my_table", pg_db_conn)

# When a single word is provided, it is treated as a table name for a select all query
df_table = df.from_sql("my_table", pg_db_conn)
From SQL Server Databse
import pyodbc
import sqldatamodel as sdm

# Create connection object
con = pyodbc.connect("DRIVER={SQL Server};SERVER=host;DATABASE=db;UID=user;PWD=pw;")

# Basic usage with a select query
df = sdm.from_sql("SELECT * FROM my_table", con)

# When a single word is provided, it is treated as a table name for a select all query
df_table = df.from_sql("my_table", con)

Note

  • When con is provided as a string a connection will be attempted using utils._create_connection() if the path does not exist, otherwise a sqlite3 local connection will be attempted.

  • When con is provided as an object a connection is assumed to be open and valid, if a cursor cannot be created from the object an exception will be raised.

  • Unsupported connection object will output a SQLDataModelWarning advising unstable or undefined behaviour.

  • The dtypes, if provided, are only applied to sqlite3 connection objects as remaining supported connections implement SQL to python adapters.

  • See related SQLDataModel.to_sql() for writing to SQL database connections.

  • See utility methods utils._parse_connection_url() and utils._create_connection() for implementation on creating database connections from urls.

Changelog:
  • Version 2.3.0 (2026-01-21):
    • Modified to allow returning an empty result set from the execution of sql, constructing an empty model using the headers returned from the cursor.

  • Version 0.9.1 (2024-06-27):
    • Modified handling of con parameter to allow database connection url to also be provided as 'scheme://user:pass@host:port/db'

  • Version 0.8.2 (2024-06-24):
    • Modified handling of con parameter to allow providing SQLite database filepath directly as string to instantiate connection.

  • Version 0.3.0 (2024-03-31):
    • Renamed sql_query parameter to sql for consistency with similar method arguments.

classmethod from_text(text_source: str, table_identifier: int = 1, encoding: str = 'utf-8', headers: list[str] = None, **kwargs) SQLDataModel[source]

Returns a new SQLDataModel generated from the provided text_source, either as a file if the path exists, or from a raw string literal if the path does not exist.

Parameters:
  • text_source (str) – The path to the tabular data file or a raw string literal containing tabular data.

  • table_identifier (int, optional) – The index position of the target table within the text source. Default is 1.

  • encoding (str, optional) – The encoding used to decode the text source if it is a file. Default is ‘utf-8’.

  • headers (list, optional) – The headers to use for the provided data. Default is to use the first row.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided tabular data.

Return type:

SQLDataModel

Raises:
  • TypeError – If text_source is not a string or table_identifier is not an integer.

  • ValueError – If no tabular data is found in text_source, if parsing fails to extract valid tabular data, or if the provided table_identifier is out of range.

  • IndexError – If the provided table_identifier exceeds the number of tables found in text_source.

  • Exception – If an error occurs while attempting to read from or process the provided text_source.

Example:

import sqldatamodel as sdm

# Text source containing tabular data
text_source = "/path/to/tabular_data.txt"

# Create the model using the text source
df = sdm.from_text(text_source, table_identifier=2)

Note

  • This method is made for parsing SQLDataModel formatted text, such as the kind generated with print(df) or the output created by the inverse method SQLDataModel.to_text()

  • For parsing other delimited tabular data, this method calls the related SQLDataModel.from_csv() method, which parses tabular data constructed with common delimiters.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

classmethod from_xml(xml_source: str, orient: Literal['rows', 'columns'] = 'rows', row_tag: str = 'row', column_tag: str = 'column', value_tag: str = 'value', root_tag: str | None = None, encoding: str = 'utf-8', infer_types: bool = True, **kwargs) SQLDataModel[source]

Creates a new SQLDataModel instance from an XML source.

Parameters:
  • xml_source (str) – File path, URL, or raw XML string.

  • orient (Literal['rows','columns']) – Orientation of XML data where ‘rows’ treats as row as a record and ‘columns’ treats each column as a list of values.

  • row_tag (str) – Row tag name when orient='rows'.

  • column_tag (str) – Column tag name when orient='columns'.

  • value_tag (str) – Value tag name inside column elements.

  • root_tag (str | None) – Optional root element selector.

  • encoding (str) – Encoding for file or URL input.

  • infer_types (bool) – Whether to infer column types.

Returns:

The SQLDataModel object created from the provided XML data.

Return type:

SQLDataModel

Raises:
  • TypeError – If xml_source is not a string type.

  • ValueError – If value for orient is not one of ‘rows’ or ‘columns’ representing the data orientation.

Example:

import sqldatamodel as sdm

# XML data as string literal
xml_literal = '''
<data>
    <row>
        <Name>Alice</Name>
        <Age>25</Age>
        <Grade>3.8</Grade>
    </row>
    <row>
        <Name>Bob</Name>
        <Age>30</Age>
        <Grade>3.9</Grade>
    </row>
    <row>
        <Name>Charlie</Name>
        <Age>35</Age>
        <Grade>3.2</Grade>
    </row>
</data>'''

# Create the model from the XML data
df = sdm.SQLDataModel.from_xml(xml_literal)

# View the resulting model
print(df)

This will output:

┌───┬─────────┬─────┬───────┐
│   │ Name    │ Age │ Grade │
├───┼─────────┼─────┼───────┤
│ 0 │ Alice   │  25 │  3.80 │
│ 1 │ Bob     │  30 │  3.90 │
│ 2 │ Charlie │  35 │  3.20 │
└───┴─────────┴─────┴───────┘
[3 rows x 3 columns]

Alternatively, column names can be parsed from name attributes of <col> tags:

import sqldatamodel as sdm

# Sample XML str literal
xml = '''
<data>
    <row>
        <col name="1">Alice</col>
        <col name="2">30</col>
    </row>
    <row>
        <col name="1">Bob</col>
        <col name="2">25</col>
    </row>
</data>
'''

df = sdm.SQLDataModel.from_xml(xml)
print(df.headers) # [1, 2]

print(df.to_json(index=False))
# [{"1": "Alice", "2": 30}, {"1": "Bob", "2": 25}]

Note

  • The headers will be parsed from either a direct self-named <COLUMN_NAME> tag, or from a generic <col> tag’s name attribute if serialized accordingly.

Changelog:
  • Version 2.3.1 (2026-01-22):
    • New method.

generate_apply_function_stub() str[source]

Generates a function template using the current SQLDataModel to format function arguments for the SQLDataModel.apply_function_to_column() method.

Returns:

A string representing the function template.

Return type:

str

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('data.csv')

# Create the stub
stub = df.generate_apply_function_stub()

# View it
print(stub)

This will output:

def func(user_name:str, user_age:int, user_salaray:float):
    # apply logic and return value
    return

Containing all the required inputs and column names needed to generate a compatible function to apply to the model and can be copy pasted into existing code.

Note

  • This method is to meant as a general informative tool or for debugging assistance if needed

  • See SQLDataModel.apply() method for usage and implementation of functions in SQLDataModel using sqlite3

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

get_column_alignment() str[source]

Returns the current column_alignment property value, dynamic by default.

Returns:

The current value of the column_alignment property.

Return type:

str

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get the current alignment value
alignment = df.get_column_alignment()

# Outputs 'dynamic'
print(alignment)

Note

get_column_dtypes(columns: str | int | list = None, dtypes: Literal['python', 'sql'] = 'python') dict[source]

Get the data types of specified columns as either Python or SQL datatypes as a dict in the format of {'column': 'dtype'}.

Parameters:
  • columns (str | int | list) – The column or columns for which to retrieve data types. Defaults to all columns.

  • dtypes (Literal["python", "sql"]) – The format in which to retrieve data types. Defaults to “python”.

Raises:
  • TypeError – If columns is not of type str, int, or list.

  • IndexError – If columns is of type int and the index is outside the valid range.

  • ValueError – If a specified column in columns is not found in the current dataset. Use SQLDataModel.get_headers() to view valid columns.

Returns:

A dictionary mapping column names to their data types.

Return type:

dict

Example:

import sqldatamodel as sdm

# Sample data
headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get all column python dtypes
df_dtypes = df.get_column_dtypes()

# View dict items
for col, dtype in df_dtypes.items():
    print(f"{col}: {dtype}")

This will output:

first: str
last: str
age: int
service: float
hire_date: date

Get SQL data types as well:

# Get specific column sql dtypes
df_dtypes = df.get_column_dtypes(columns=['first','age','service'], dtypes="sql")

# View dict items
for col, dtype in df_dtypes.items():
    print(f"{col}: {dtype}")

This will output:

first: TEXT
age: INTEGER
service: REAL

Note

  • SQLDataModel index column is not included, only columns specified in the SQLDataModel.headers attribute are in scope.

  • Only the dtypes are returned, any primary key references are removed to ensure compatability with external calls.

  • Python datatypes are returned in lower case, while SQL dtypes are returned in upper case to reflect convention.

  • See SQLDataModel.dtypes for direct mapping from column to Python data type returned as {'col': 'dtype'}.

Changelog:
  • Version 0.8.0 (2024-06-21):
    • Modified to allow columns argument to be provided as an any valid reference including integer indexes or an iterable sequence of indexes to reflect similar flexibility surrounding column referencing across package.

  • Version 0.1.9 (2024-03-19):
    • New method.

get_display_float_precision() int[source]

Retrieves the current float display precision used exclusively for representing the values of real numbers in the repr method for the SQLDataModel. Default value is set to 4 decimal places of precision.

Returns:

The current float display precision.

Return type:

int

Note

  • The float display precision is the number of decimal places to include when displaying real numbers in the string representation of the SQLDataModel.

  • This value is utilized in the repr method to control the precision of real number values.

  • The method does not affect the actual value of float dtypes in the underlying SQLDataModel

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

get_display_index() bool[source]

Returns the current value set at SQLDataModel.display_index, which determines whether or not the index is displayed in the SQLDataModel representation.

Returns:

The current value of the display_index property.

Return type:

bool

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get the current value for displaying the index
display_index = df.get_display_index()

# Output: True
print(display_index)

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

get_display_max_rows() int | None[source]

Retrieves the current value at SQLDataModel.display_max_rows, which determines the maximum rows displayed for the SQLDataModel.

Returns:

The current value set at SQLDataModel.display_max_rows.

Return type:

int or None

Example:

import sqldatamodel as sdm

# Create model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get current value
display_max_rows = df.get_display_max_rows()

# By default rows will be limited by current terminal height
print(display_max_rows) # None

Note

  • This does not affect the actual number of rows in the model, only the maximum displayed.

  • Use SQLDataModel.set_display_max_rows() to explicitly set a max row limit instead of using terminal height.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

get_headers() list[str][source]

Returns the current SQLDataModel headers.

Returns:

A list of strings representing the headers.

Return type:

list

Example:

import sqldatamodel as sdm

# Create model
df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Get current model headers
headers = df.get_headers()

# Display values
print(headers) # outputs: ['First Name', 'Last Name', 'Salary']
Changelog:
  • Version 0.1.5 (2023-11-24):
    • New method.

get_indicies() tuple[source]

Returns the current valid row indicies for the SQLDataModel instance.

Returns:

A tuple of the current values for SQLDataModel.sql_idx in ascending order.

Return type:

tuple

Example:

import sqldatamodel as sdm

headers = ['Name', 'Age', 'Height']
data = [('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8)]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get current valid indicies
valid_indicies = df.get_indicies()

# View results
print(valid_indicies)

This will output:

(0, 1, 2)
Notes
  • Primary use is to confirm valid model indexing when starting index != 0 or filtering changes minimum/maximum indexes.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

get_max_column_width() int[source]

Returns the current max_column_width property value.

Returns:

The current value of the max_column_width property.

Return type:

int

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get the current max column width value
max_width = df.get_max_column_width()

# Output
print(max_width)  # 32
get_min_column_width() int[source]

Returns the current min_column_width property value.

Returns:

The current value of the min_column_width property.

Return type:

int

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get and save the current value
min_width = df.get_min_column_width()

# Output
print(min_width)  # 6
get_model_name() str[source]

Returns the SQLDataModel table name currently being used by the model as an alias for any SQL queries executed by the user and internally.

Returns:

The current SQLDataModel table name set by value of attribute SQLDataModel.model_name.

Return type:

str

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['Column1', 'Column2'])

# Get the current name
model_name = df.get_model_name()

# View it
print(f'The model is currently using the table name: {model_name}')

Note

Changelog:
  • Version 0.1.5 (2023-11-24):
    • New method.

get_shape() tuple[int, int][source]

Returns the current shape of the SQLDataModel as a tuple of (rows x columns).

Returns:

A tuple representing the current dimensions of rows and columns in the SQLDataModel.

Return type:

tuple[int, int]

Example:

import sqldatamodel as sdm

# Create the model
sdm = SQLDataModel([[1,2,3],
                    [4,5,6],
                    [7,8,9]])

# Get the current shape
shape = df.get_shape()

# View it
print("shape:", shape)

This will output:

shape: (3, 3)

The shape can also be seen when printing the model:

import sqldatamodel as sdm

# Create the model
df = sdm.SQLDataModel([[1,2,3],
                    [4,5,6],
                    [7,8,9]])

# View it and the shape
print(sdm, "<-- shape is also visible here")

This will output:

┌───┬───────┬───────┬───────┐
│   │ col_0 │ col_1 │ col_2 │
├───┼───────┼───────┼───────┤
│ 0 │     1 │     2 │     3 │
│ 1 │     4 │     5 │     6 │
│ 2 │     7 │     8 │     9 │
└───┴───────┴───────┴───────┘
[3 rows x 3 columns] <-- shape is also visible here

Note

Changelog:
  • Version 0.3.6 (2024-04-09):
classmethod get_supported_sql_connections() tuple[source]

Returns the currently tested DB API 2.0 dialects for use with SQLDataModel.from_sql() method.

Returns:

A tuple of supported DB API 2.0 dialects.

Return type:

tuple

Example:

import sqldatamodel as sdm

# Get supported dialects
supported_dialects = sdmSQLDataModel.get_supported_sql_connections()

# View details
print(supported_dialects)

# Outputs
supported_dialects = ('sqlite3', 'psycopg2', 'pyodbc', 'cx_oracle', 'teradatasql')
Changelog:
  • Version 0.1.5 (2023-11-24):
    • New method.

group_by(columns: str | list[str], order_by_count: bool = True) SQLDataModel[source]

Returns a new SQLDataModel after performing a group by operation on specified columns.

Parameters:
  • columns (str, list, tuple) – Columns to group by. Accepts either individual strings or a list/tuple of strings.

  • order_by_count (bool, optional) – If True (default), orders the result by count. If False, orders by the specified columns.

Raises:
  • TypeError – If the columns argument is not of type str, list, or tuple.

  • ValueError – If any specified column does not exist in the current model.

  • SQLProgrammingError – If any specified columns or aggregate keywords are invalid or incompatible with the current model.

Returns:

A new SQLDataModel instance containing the result of the group by operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female')
]
# Create the model
df = sdm.SQLDataModel(data, headers, display_float_precision=2, display_index=True)

# Group by 'gender' column
df_gender = df.group_by("gender")

# View model
print(df_gender)

This will output:

┌───┬────────┬───────┐
│   │ gender │ count │
├───┼────────┼───────┤
│ 0 │ Male   │     3 │
│ 1 │ Female │     2 │
└───┴────────┴───────┘
[2 rows x 2 columns]

Multiple columns can also be used to group by:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('data.csv')

# Group by multiple columns
df.group_by(["country", "state", "city"])

Note

  • Use order_by_count=False to change ordering from count to column arguments provided.

  • See SQLDataModel.describe() for generating descriptive statistics by column data type.

  • See SQLDataModel.pivot() for creating a pivot table using categorization and aggregate functions.

Changelog:
  • Version 0.8.0 (2024-06-21):
    • Modified to allow columns to be referenced by their integer index as well as directly to allow broader inputs and reflect similar access patterns across package.

head(n_rows: int = 5) SQLDataModel[source]

Returns the first n_rows of the current SQLDataModel.

Parameters:

n_rows (int, optional) – Number of rows to return. Defaults to 5.

Raises:

TypeError – If n_rows argument is not of type ‘int’ representing the number of rows to return from the head of the model.

Returns:

A new SQLDataModel instance containing the specified number of rows.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Countries data available for sample dataset
url = 'https://developers.google.com/public-data/docs/canonical/countries_csv'

# Create the model
df = sdm.from_html(url)

# Get head of model
df_head = df.head()

# View it
print(df_head)

This will grab the top 5 rows by default:

┌───┬─────────┬──────────┬───────────┬────────────────┐
│   │ country │ latitude │ longitude │ name           │
├───┼─────────┼──────────┼───────────┼────────────────┤
│ 0 │ AF      │  33.9391 │   67.7100 │ Afghanistan    │
│ 1 │ AL      │  41.1533 │   20.1683 │ Albania        │
│ 2 │ DZ      │  28.0339 │    1.6596 │ Algeria        │
│ 3 │ AS      │ -14.2710 │ -170.1322 │ American Samoa │
│ 4 │ AD      │  42.5462 │    1.6016 │ Andorra        │
└───┴─────────┴──────────┴───────────┴────────────────┘
[5 rows x 4 columns]

Note

  • See related SQLDataModel.tail() for the opposite, grabbing the bottom n_rows from the current model.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

header_master[source]

Maps the current model’s column metadata in the format of 'column_name': ('sql_dtype', 'py_dtype', is_regular_column, 'default_alignment'), updated by SQLDataModel._update_model_metadata().

Type:

dict[str, tuple]

headers[source]

The current column names of the model. If not provided, default column names will be used.

Type:

list[str]

hstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel[source]

Horizontally stacks one or more SQLDataModel objects to the current model.

Parameters:
  • other (SQLDataModel or sequence of) – The SQLDataModel objects to horizontally stack.

  • inplace (bool, optional) – If True, performs the horizontal stacking in-place, modifying the current model. Defaults to False, returning a new SQLDataModel.

Returns:

The horizontally stacked SQLDataModel instance when inplace is False.

Return type:

SQLDataModel

Raises:
  • ValueError – If no additional SQLDataModels are provided for horizontal stacking.

  • TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.

  • SQLProgrammingError – If an error occurs when updating the model values in place.

Example:

import sqldatamodel as sdm

# Create models A and B
df_a = sdm.SQLDataModel([('A', 'B'), ('1', '2')], headers=['A1', 'A2'])
df_b = sdm.SQLDataModel([('C', 'D'), ('3', '4')], headers=['B1', 'B2'])

# Horizontally stack B onto A
df_ab = df_a.hstack(df_b)

# View stacked model
print(df_ab)

This will output the result of stacking B onto A, using each model’s headers and dtypes:

┌─────┬─────┬─────┬─────┐
│ A1  │ A2  │ B1  │ B2  │
├─────┼─────┼─────┼─────┤
│ A   │ B   │ C   │ D   │
│ 1   │ 2   │ 3   │ 4   │
└─────┴─────┴─────┴─────┘
[2 rows x 4 columns]

Multiple models can be stacked simultaneously, here we stack a total of 3 models:

# Create a third model C
df_c = sdm.SQLDataModel([('E', 'F'), ('5', '6')], headers=['C1', 'C2'])

# Horizontally stack three models
df_abc = df_a.hstack([df_b, df_c])

# View stacked result
print(df_abc)

This will output the result of stacking C and B onto A:

┌─────┬─────┬─────┬─────┬─────┬─────┐
│ A1  │ A2  │ B1  │ B2  │ C1  │ C2  │
├─────┼─────┼─────┼─────┼─────┼─────┤
│ A   │ B   │ C   │ D   │ E   │ F   │
│ 1   │ 2   │ 3   │ 4   │ 5   │ 6   │
└─────┴─────┴─────┴─────┴─────┴─────┘
[2 rows x 6 columns]

Note

  • Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use SQLDataModel.merge() for strict SQL joins instead of hstack.

  • Headers and data types are inherited from all the models being stacked, this requires aliasing duplicate column names if present, see utils.alias_duplicates() for aliasing rules.

  • Use setitem syntax such as sdm['New Column'] = values to create new columns directly into the current model instead of stacking or see SQLDataModel.add_column_with_values() for convenience method accomplishing the same.

  • See SQLDataModel.vstack() for vertical stacking.

Changelog:
  • Version 0.3.4 (2024-04-05):
    • New method.

indicies[source]

The current valid row indicies of the model.

Type:

tuple

infer_dtypes(n_samples: int = 16, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') None[source]

Infer and set data types for columns based on a random subset of n_samples from the current model. The dateutil library is required for complex date and datetime parsing, if the module is not found then date_format and datetime_format will be used for dates and datetimes respectively.

Parameters:
  • n_samples (int) – The number of random samples to use for data type inference. Default set to 16.

  • date_format (str) – The format string to use for parsing date values if dateutil library is not found. Default is ‘%Y-%m-%d’.

  • datetime_format (str) – The format string to use for parsing datetime values if dateutil library is not found. Default is ‘%Y-%m-%d %H:%M:%S’.

Raises:
  • TypeError – If argument for n_samples is not of type int or if argument for date_format or datetime_format is not of type ‘str’.

  • ValueError – If the current model contains zero columns from which to infer types from.

  • DimensionError – If the current model contains insufficient rows to sample from.

Returns:

Inferred column types are updated and None is returned.

Return type:

None

Example:

import sqldatamodel as sdm

# Sample data of ``str`` containing probable datatypes
headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', '27', '1.22', '2023-02-01'),
    ('Sarah', 'West', '39', '0.7', '2023-10-01'),
    ('Mike', 'Harlin', '36', '3.9', '2020-08-27'),
    ('Pat', 'Douglas', '42', '11.5', '2015-11-06'),
    ('Kelly', 'Lee', '32', '8.0', '2016-09-18')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get current column dtypes for reference
dtypes_before = df.get_column_dtypes()

# Infer and set data types based on 10 random samples
df.infer_dtypes(n_samples=10)

# View updated model
print(df)

This will output data with dtypes correctly aligned:

┌───────┬─────────┬──────┬─────────┬────────────┐
│ first │ last    │  age │ service │ hire_date  │
├───────┼─────────┼──────┼─────────┼────────────┤
│ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │
│ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │
│ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │
│ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │
│ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │
└───────┴─────────┴──────┴─────────┴────────────┘
[5 rows x 5 columns]

Use SQLDataModel.get_column_dtypes() or SQLDataModel.dtypes to view current types:

# Get new column types to confirm
dtypes_after = df.get_column_dtypes()

# View updated dtypes
for col in df.headers:
    print(f"{col:<10} {dtypes_before[col]} -> {dtypes_after[col]}")

This will output:

first:      str -> str
last:       str -> str
age:        str -> int
service:    str -> float
hire_date:  str -> date

Note

  • If a single str instance is found in the samples, the corresponding column dtype will remain as str to avoid data loss.

  • Co-occurences of int & float, or date & datetime will favor the superset dtype after infer_threshold is met, so float and datetime respectively.

  • If a single datetime instance is found amongst a higher proportion of date dtypes, datetime will be used according to second rule.

  • If a single float instance is found amongst a higher proportion of int dtypes, float will be used according to second rule.

  • Ties between dtypes are broken according to current type < str < float < int < datetime < date < bytes < None

  • This method calls SQLDataModel.set_column_dtypes() once the column dtypes have been inferred if they differ from the current dtype.

  • See SQLDataModel.infer_str_type() for type determination process.

  • See utils.infer_types_from_data() for type voting scheme used for inference.

Changelog:
  • Version 0.2.0 (2024-03-19):
    • Increased sampling size for inference from n_samples=10 to n_samples=16 for better resolution.

  • Version 0.1.9 (2024-03-19):
    • New method.

insert_row(index: int, values: list | tuple, on_conflict: Literal['replace', 'ignore'] = 'replace') None[source]

Inserts a new row into the SQLDataModel at the specified index with the provided values.

Parameters:
  • index (int) – The position at which to insert the row.

  • values (list or tuple) – The values to be inserted into the row.

  • on_conflict (Literal['replace', 'ignore'], optional) – Specifies the action to take if the index already exists. Default is ‘replace’.

Raises:
  • TypeError – If index is not an integer or values is not a list or tuple.

  • ValueError – If on_conflict is not 'replace' or 'ignore'.

  • DimensionError – If the dimensions of the provided values are incompatible with the current model dimensions.

  • SQLProgrammingError – If there is an issue with the SQL execution during the insertion.

Returns:

None

Example:

import sqldatamodel as sdm

# Sample data
data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')]

# Create the model
df = sdm.SQLDataModel(data, headers=['Name','Age','Sex'])

# Insert a new row at index 3
df.insert_row(3, ['David', 35, 'M'])

# Insert or replace row at index 1
df.insert_row(1, ['Beth', 27, 'F'], on_conflict='replace')

# View result
print(df)

This will output the modified model:

┌───┬───────┬─────┬─────┐
│   │ Name  │ Age │ Sex │
├───┼───────┼─────┼─────┤
│ 0 │ Alice │  20 │ F   │
│ 1 │ Beth  │  27 │ F   │
│ 2 │ Chris │  30 │ M   │
│ 3 │ David │  35 │ M   │
└───┴───────┴─────┴─────┘
[4 rows x 3 columns]

Note

  • Use on_conflict = 'ignore' to take no action if row already exists, and on_conflict = 'replace' to replace it.

  • See SQLDataModel.append_row() for appending rows at the next available index instead of insertion at index.

Changelog:
  • Version 0.6.0 (2024-05-14):
    • Added index and on_conflict parameters for greater specificity and to align with broader conventions surrounding insert methods.

  • Version 0.1.9 (2024-03-19):
    • New method.

isna() set[int][source]

Return the row indicies containing null values from the current model.

Returns:

Set of row indicies containing null values.

Return type:

set[int]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Gender', 'City']
data = [
    ('Sarah', 35, 'Female', 'Houston'),
    ('Alice', None, 'Female', 'Milwaukee'),
    ('Mike', None, 'Male', 'Atlanta'),
    ('John', 25, 'Male', 'Boston'),
    ('Bob', None, 'Male', 'Chicago'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter for rows where 'Age' is null
df = df[df['Age'].isna()]

# View result
print(df)

This will output the result containing the rows where ‘Age’ was null:

┌───────┬─────┬────────┬───────────┐
│ Name  │ Age │ Gender │ City      │
├───────┼─────┼────────┼───────────┤
│ Alice │     │ Female │ Milwaukee │
│ Mike  │     │ Male   │ Atlanta   │
│ Bob   │     │ Male   │ Chicago   │
└───────┴─────┴────────┴───────────┘
[3 rows x 4 columns]

This can be used in combination with the setitem syntax to selectively update values as well:

# Filter and set the null values
df[df['Age'].isna(), 'Age'] = 'Missing'

Note

  • Null or na like is determined by satisfying the SQL NULL value or the Python equivalent None for all values in the row.

  • See related SQLDataModel.notna() to filter for rows containing values that are not null.

  • See SQLDataModel.fillna() to fill all missing or null values in the model.

Changelog:
  • Version 0.7.2 (2024-06-11):
    • New method.

iter_rows(min_row: int = None, max_row: int = None, index: bool = True, include_headers: bool = False) Iterator[tuple][source]

Returns an iterator over the specified rows in the current SQLDataModel.

Parameters:
  • min_row (int, optional) – The minimum row index to start iterating from (inclusive). Defaults to None.

  • max_row (int, optional) – The maximum row index to iterate up to (exclusive). Defaults to None.

  • index (bool, optional) – Whether to include the row index in the output. Defaults to True.

  • include_headers (bool, optional) – Whether to include headers as the first row. Defaults to False.

Yields:

Iterator[tuple] – An iterator containing the rows from the specified range with headers as the first row if specified.

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['First', 'Last', 'Salary'])

# Iterate over the rows
for row in df.iter_rows(min_row=2, max_row=4):
    pass # Do stuff

Note

  • Rows are referenced by their index and not their value. E.g., min_row = 0 and max_row = -1 will reference the first and last rows, respectively.

  • See SQLDataModel.iter_tuples() for iterating over rows as named tuples.

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.9 (2024-03-19):
    • New method.

iter_tuples(index: bool = False) Iterator[NamedTuple][source]

Returns an iterator of rows from the current SQLDataModel as namedtuples using headers as field names.

Parameters:

index (bool, optional) – Whether to include the index column in the namedtuples. Default is False.

Raises:

ValueError – Raised if headers are not valid Python identifiers. Use SQLDataModel.normalize_headers() method to fix.

Yields:

Iterator[NamedTuple] – An iterator of namedtuples for each row using current headers for field names.

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['First', 'Last', 'Salary'])

# Iterate over the namedtuples
for row_tuple in df.iter_tuples(index=True):
    pass # Do stuff with namedtuples

Note

Changelog:
  • Version 0.10.0 (2024-06-29):
  • Version 0.1.9 (2024-03-19):
    • New method.

max() SQLDataModel[source]

Returns a new SQLDataModel containing the maximum value of all non-null values for each column in a row-wise orientation.

Returns:

A new SQLDataModel containing the maximum non-null value for each column.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data with missing values
headers = ['Name', 'Age', 'Gender', 'Tenure']
data = [
    ('Alice', 25, 'Female', 1.0),
    ('Bob', None, 'Male', 2.7),
    ('Charlie', 30, 'Male', None),
    ('David', None, 'Male', 3.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get maximum values
min_values = df.min()

# View result
print(min_values)

This will output the maximum value of all non-null values for each column:

┌───────┬─────┬────────┬────────┐
│ Name  │ Age │ Gender │ Tenure │
├───────┼─────┼────────┼────────┤
│ David │  30 │ Male   │   3.80 │
└───────┴─────┴────────┴────────┘
[1 rows x 4 columns]

Note

Changelog:
  • Version 0.3.1 (2024-04-01):
    • New method.

max_column_width[source]

The maximum column width in characters to use for string representations of the data. Default is 38.

Type:

int

mean() SQLDataModel[source]

Returns a new SQLDataModel containing the mean value of all viable columns in the current model. Calculated by sum(x_i, ..., x_n) * (1 / N)

Returns:

A new SQLDataModel containing the mean values of each column.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Birthday', 'Height', 'Date of Hire']
data = [
    ('John', 30, '1994-06-15', 175.3, '2018-03-03 11:20:19'),
    ('Alice', 28, '1996-11-20', 162.0, '2023-04-24 08:45:30'),
    ('Travis', 37, '1987-01-07', 185.8, '2012-10-06 15:30:40')
]

# Create the model and infer correct types
df = sdm.SQLDataModel(data, headers, infer_dtypes=True)

# View full model
print(df)

This will output the sample model we’ll be using to calculate mean values for:

┌────────┬─────┬────────────┬─────────┬─────────────────────┐
│ Name   │ Age │ Birthday   │  Height │ Date of Hire        │
├────────┼─────┼────────────┼─────────┼─────────────────────┤
│ John   │  30 │ 1994-06-15 │  175.30 │ 2018-03-03 11:20:19 │
│ Alice  │  28 │ 1996-11-20 │  162.00 │ 2023-04-24 08:45:30 │
│ Travis │  37 │ 1987-01-07 │  185.80 │ 2012-10-06 15:30:40 │
└────────┴─────┴────────────┴─────────┴─────────────────────┘
[3 rows x 5 columns]

Now let’s find the mean values:

# Calculate the mean values
df_mean = df.mean()

# View result
print(df_mean)

This will output the mean values for the “Age”, “Birthday”, “Height” and “Date of Hire” columns:

┌──────┬────────┬────────────┬─────────┬─────────────────────┐
│ Name │    Age │ Birthday   │  Height │ Date of Hire        │
├──────┼────────┼────────────┼─────────┼─────────────────────┤
│ NaN  │  31.67 │ 1992-10-14 │  174.37 │ 2018-01-30 11:52:09 │
└──────┴────────┴────────────┴─────────┴─────────────────────┘
[1 rows x 5 columns]

Note

  • Only non-null values are included in the calculation of the sum and the total number of values in the column, use SQLDataModel.fillna() to fill null values.

  • For date and datetime columns values are converted to julian days prior to calculation and recast into original data type, some imprecision may occur as a result.

  • See SQLDataModel.min() for returning the minimum value, SQLDataModel.max() for maximum value, and SQLDataModel.describe() for descriptive statical values.

Changelog:
  • Version 0.3.7 (2024-04-10):
    • New method.

merge(merge_with: SQLDataModel, how: Literal['left', 'right', 'inner', 'full outer', 'cross'] = 'left', left_on: str = None, right_on: str = None, include_join_column: bool = False) SQLDataModel[source]

Merges two SQLDataModel instances based on specified columns and merge type, how, returning the result as a new instance. If the join column shares the same name in both models, left_on and right_on column arguments are not required and will be inferred. Otherwise, explicit arguments for both are required.

Parameters:
  • merge_with (SQLDataModel) – The SQLDataModel to merge with the current model.

  • how (Literal["left", "right", "inner", "full outer", "cross"]) – The type of merge to perform.

  • left_on (str) – The column name from the current model to use as the left join key.

  • right_on (str) – The column name from the merge_with model to use as the right join key.

  • include_join_column (bool) – If the shared column being used as the join key should be included from both tables. Default is False.

Raises:
  • TypeError – If merge_with is not of type SQLDataModel.

  • SQLProgrammingError – If sqlite3 version < 3.39.0 and join type is one of ‘right’ or ‘full outer’ which were unsupported.

  • DimensionError – If no shared column exists, and explicit left_on and right_on arguments are not provided.

  • ValueError – If the specified left_on or right_on column is not found in the respective models.

Returns:

A new SQLDataModel containing the product of the merged result.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Left table data with ID column
left_headers = ["Name", "Age", "ID"]
left_data = [
    ["Bob", 35, 1],
    ["Alice", 30, 5],
    ["David", 40, None],
    ["Charlie", 25, 2]
]
# Right table data with shared ID column
right_headers = ["ID", "Country"]
right_data = [
    [1, "USA"],
    [2, "Germany"],
    [3, "France"],
    [4, "Latvia"]
]

# Create the left and right tables
df_left = sdm.SQLDataModel(left_data, left_headers)
df_right = sdm.SQLDataModel(right_data, right_headers)

Here are the left and right tables we will be joining:

Left Table:                     Right Table:
┌─────────┬──────┬──────┐       ┌──────┬─────────┐
│ Name    │  Age │   ID │       │   ID │ Country │
├─────────┼──────┼──────┤       ├──────┼─────────┤
│ Bob     │   35 │    1 │       │    1 │ USA     │
│ Alice   │   30 │    5 │       │    2 │ Germany │
│ David   │   40 │      │       │    3 │ France  │
│ Charlie │   25 │    2 │       │    4 │ Latvia  │
└─────────┴──────┴──────┘       └──────┴─────────┘
[4 rows x 3 columns]            [4 rows x 2 columns]
Left Join
# Create a model by performing a left join with the tables
df_joined = df_left.merge(df_right, how="left")

# View result
print(df_joined)

This will output:

Left Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Alice   │   30 │    5 │         │
│ David   │   40 │      │         │
│ Charlie │   25 │    2 │ Germany │
└─────────┴──────┴──────┴─────────┘
[4 rows x 4 columns]
Right Join
# Create a model by performing a right join with the tables
df_joined = df_left.merge(df_right, how="right")

# View result
print(df_joined)

This will output:

Right Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Charlie │   25 │    2 │ Germany │
│         │      │      │ France  │
│         │      │      │ Latvia  │
└─────────┴──────┴──────┴─────────┘
[4 rows x 4 columns]
Inner Join
# Create a model by performing an inner join with the tables
df_joined = df_left.merge(df_right, how="inner")

# View result
print(df_joined)

This will output:

Inner Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Charlie │   25 │    2 │ Germany │
└─────────┴──────┴──────┴─────────┘
[2 rows x 4 columns]
Full Outer Join
# Create a model by performing a full outer join with the tables
df_joined = df_left.merge(df_right, how="full outer")

# View result
print(df_joined)

This will output:

Full Outer Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Alice   │   30 │    5 │         │
│ David   │   40 │      │         │
│ Charlie │   25 │    2 │ Germany │
│         │      │      │ France  │
│         │      │      │ Latvia  │
└─────────┴──────┴──────┴─────────┘
[6 rows x 4 columns]
Cross Join
# Create a model by performing a cross join with the tables
df_joined = df_left.merge(df_right, how="cross")

# View result
print(df_joined)

This will output:

Cross Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Bob     │   35 │    1 │ Germany │
│ Bob     │   35 │    1 │ France  │
│ Bob     │   35 │    1 │ Latvia  │
│ Alice   │   30 │    5 │ USA     │
│ Alice   │   30 │    5 │ Germany │
│ Alice   │   30 │    5 │ France  │
│ Alice   │   30 │    5 │ Latvia  │
│ David   │   40 │      │ USA     │
│ David   │   40 │      │ Germany │
│ David   │   40 │      │ France  │
│ David   │   40 │      │ Latvia  │
│ Charlie │   25 │    2 │ USA     │
│ Charlie │   25 │    2 │ Germany │
│ Charlie │   25 │    2 │ France  │
│ Charlie │   25 │    2 │ Latvia  │
└─────────┴──────┴──────┴─────────┘
[16 rows x 4 columns]

Note

  • If include_join_column=False then only the left_on join column is included in the result, with the right_on column removed to avoid redundant shared key values.

  • If include_join_column=True then all the columns from both models are included in the result, with aliasing to avoid naming conflicts, see utils.alias_duplicates() for details.

  • The resulting SQLDataModel is created based on the sqlite3 join definition and specified columns and merge type, for details see sqlite3 documentation.

  • See SQLDataModel.hstack() for horizontally stacking SQLDataModel using shared row dimensions.

  • See SQLDataModel.vstack() for vertically stacking SQLDataModel using shared column dimensions.

Changelog:
  • Version 0.10.2 (2024-06-30):
    • Changed merge_with from keyword argument to positional argument to reflect argument is required and not optional.

  • Version 0.10.1 (2024-06-29):
    • Modified to raise SQLProgrammingError if available sqlite3 version < 3.39.0 and join type is one of ‘right’ or ‘full outer’, which was not supported by older versions.

  • Version 0.1.9 (2024-03-19):
    • New method.

min() SQLDataModel[source]

Returns a new SQLDataModel containing the minimum value of all non-null values for each column in a row-wise orientation.

Returns:

A new SQLDataModel containing the minimum non-null value for each column.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data with missing values
headers = ['Name', 'Age', 'Gender', 'Tenure']
data = [
    ('Alice', 25, 'Female', 1.0),
    ('Bob', None, 'Male', 2.7),
    ('Charlie', 30, 'Male', None),
    ('David', None, 'Male', 3.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get minimum values
min_values = df.min()

# View result
print(min_values)

This will output the minimum value of all non-null values for each column:

┌───────┬─────┬────────┬────────┐
│ Name  │ Age │ Gender │ Tenure │
├───────┼─────┼────────┼────────┤
│ Alice │  25 │ Female │   1.00 │
└───────┴─────┴────────┴────────┘
[1 rows x 4 columns]

Note

Changelog:
  • Version 0.3.1 (2024-04-01):
    • New method.

min_column_width[source]

The minimum column width in characters to use for string representations of the data. Default is 3.

Type:

int

normalize_headers(apply_function: Callable = None) None[source]

Reformats the current SQLDataModel headers into an uncased normalized form using alphanumeric characters only. Wraps SQLDataModel.set_headers().

Parameters:

apply_function (Callable, optional) – Specify an alternative normalization pattern. When None, the pattern '[^0-9a-z _]+' will be used on uncased values.

Returns:

None

Example:

import sqldatamodel as sdm

# Create model
df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Use default normalization scheme, uncased and strips invalid SQL identifiers
df.normalize_headers()

# Get renamed headers after default normalization
df.get_headers() # now outputs ['first_name', 'last_name', 'salary']

# Or use custom renaming scheme
df.normalize_headers(lambda x: x.upper())

# Get renamed headers again
df.get_headers() # now outputs ['FIRST_NAME', 'LAST_NAME', 'SALARY']
Changelog:
  • Version 1.2.0 (2025-01-28):
    • Added duplicate aliasing to prevent post-normalization name collisions using utils.alias_duplicates()

    • Modified default normalization function to better handle occurrences of multiple invalid characters.

  • Version 0.1.5 (2023-11-24):
    • New method.

notna() set[int][source]

Return the row indicies that do not contain null values from the current model.

Returns:

Set of row indicies containing values that are not null.

Return type:

set[int]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Gender', 'City']
data = [
    ('Sarah', 35, 'Female', 'Houston'),
    ('Alice', None, 'Female', 'Milwaukee'),
    ('Mike', None, 'Male', 'Atlanta'),
    ('John', 25, 'Male', 'Boston'),
    ('Bob', None, 'Male', 'Chicago'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter for rows where 'Age' is not null
df = df[df['Age'].notna()]

# View result
print(df)

This will output the result containing the rows where ‘Age’ was not null:

┌───────┬─────┬────────┬─────────┐
│ Name  │ Age │ Gender │ City    │
├───────┼─────┼────────┼─────────┤
│ Sarah │  35 │ Female │ Houston │
│ John  │  25 │ Male   │ Boston  │
└───────┴─────┴────────┴─────────┘
[2 rows x 4 columns]

This can be used in combination with the setitem syntax to selectively update values as well:

# Create a 'Notes' column with a default value
df['Notes'] = 'Missing'

# Filter and set the values that are not null
df[df['Age'].notna(), 'Notes'] = 'Valid'

Note

  • Null or na like is determined by satisfying the SQL NOT NULL value or the Python equivalent None for any values in the row.

  • See related SQLDataModel.isna() to filter for rows containing values that are null.

  • See SQLDataModel.fillna() to fill all missing or null values in the model.

Changelog:
  • Version 0.7.2 (2024-06-11):
    • New method.

pivot(pivot_column: str, category_column: str, amount_column: str | list[str] = None, fill_value: Any = None, agg_func: Literal['sum', 'avg', 'min', 'max'] = 'sum') SQLDataModel[source]

Create a pivot table using the columns specified and return the result as a new SQLDataModel.

The pivot method transforms the data in the SQLDataModel into a pivot table format summarizing the values of one column amount_column based on unique values from two other columns, the pivot_column and the category_column

Parameters:
  • pivot_column (str) – Column to pivot on. The unique values in this column will form the rows of the pivot table.

  • category_column (str) – Column to categorize the data. The unique values in this column will form the columns of the pivot table.

  • amount_column (str, list, optional) – Column(s) to aggregate. Accepts either a single string or a list of strings. Defaults to all numeric columns if not provided.

  • fill_value (Any, optional) – Value to fill when there is no data for a particular category. Defaults to None.

  • agg_func (Literal['sum', 'avg', 'min', 'max'], optional) – Aggregate function to use. Defaults to ‘sum’.

Raises:
  • TypeError – If arguments for columns are of type ‘str’, ‘int’ or list, representing column name or integer index.

  • ValueError – If there are insufficient numeric columns for aggregation, invalid aggregate function, or insufficient distinct values in the category column.

Returns:

A new SQLDataModel instance containing the result of the pivot operation.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Product', 'Units', 'Qtr','Sales']
data = [
    ('Chainsaw', 75, 'Q1', 1500),
    ('Chainsaw', 80, 'Q2', 1600),
    ('Chainsaw', 78, 'Q3', 1550),
    ('Chainsaw', 79, 'Q4', 1580),
    ('Hammer', 40, 'Q1', 800),
    ('Hammer', 42, 'Q2', 850),
    ('Hammer', 41, 'Q3', 820),
    ('Hammer', 42, 'Q4', 830),
    ('Drill', 50, 'Q1', 1000),
    ('Drill', 55, 'Q2', 1100),
    ('Drill', 52, 'Q3', 1050),
    ('Drill', 54, 'Q4', 1080)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Pivot 'Product' by quarterly sales
quarterly_sales = df.pivot('Product', 'Qtr', 'Sales')

# View result
print(quarterly_sales)

This will output a wide pivot table summing up ‘Sales’ by ‘Qtr’ for each ‘Product’:

┌───┬──────────┬──────┬──────┬──────┬──────┐
│   │ Product  │   Q1 │   Q2 │   Q3 │   Q4 │
├───┼──────────┼──────┼──────┼──────┼──────┤
│ 0 │ Chainsaw │ 4810 │ 7800 │ 1550 │ 1580 │
│ 1 │ Drill    │ 1000 │ 1100 │ 1050 │ 1080 │
│ 2 │ Hammer   │  800 │  850 │  820 │  830 │
└───┴──────────┴──────┴──────┴──────┴──────┘
[3 rows x 5 columns]

Multiple columns can be aggregated to produce a wider pivot

# This time pivot by 'Sales' and 'Units'
quarterly_metrics = df.pivot('Product', 'Qtr', ['Units','Sales'])

# View new pivot
print(quarterly_metrics)

When multiple aggregates are used, columns are labeled ‘<category_column> <amount_column>’:

┌───┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│   │ Product  │ Q1 Units │ Q1 Sales │ Q2 Units │ Q2 Sales │ Q3 Units │ Q3 Sales │ Q4 Units │ Q4 Sales │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 0 │ Chainsaw │      225 │     4810 │      210 │     7800 │       78 │     1550 │       79 │     1580 │
│ 1 │ Drill    │       50 │     1000 │       55 │     1100 │       52 │     1050 │       54 │     1080 │
│ 2 │ Hammer   │       40 │      800 │       42 │      850 │       41 │      820 │       42 │      830 │
└───┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
[3 rows x 9 columns]

Note

  • When no amount_column is provided, all numeric column types will be used.

  • All column arguments can also be specified by their integer index as well as their value.

  • Use fill_value to provide a default value when there is no data for a particular category.

  • The aggregate function specified in agg_func must be one of ‘sum’, ‘avg’, ‘min’, or ‘max’, with sum used by default.

  • See SQLDataModel.transpose() for transposing rows and columns directly.

  • See SQLDataModel.group_by() for regular aggregation.

Changelog:
  • Version 0.10.3 (2024-07-01):
    • New method.

rename_column(column: int | str, new_column_name: str) None[source]

Renames a column in the SQLDataModel at the specified index or using the old column name into the new value provided in new_column_name.

Parameters:
  • column (int|str) – The index or current str value of the column to be renamed.

  • new_column_name (str) – The new name as a str value for the specified column.

Raises:
  • TypeError – If the column or new_column_name parameters are invalid types.

  • IndexError – If the provided column index is outside the current column range.

  • SQLProgrammingError – If there is an issue with the SQL execution during the column renaming.

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the model with sample data
df = sdm.SQLDataModel(data, headers)

# Example: Rename the column at index 1 to 'first_name'
df.rename_column(1, 'first_name')

# Get current values
new_headers = df.get_headers()

# Outputs ['first_name', 'last', 'age']
print(new_headers)

Note

  • The method allows renaming a column identified by its index in the SQLDataModel.

  • Handles negative indices by adjusting them relative to the end of the column range.

  • If an error occurs during SQL execution, it rolls back the changes and raises a SQLProgrammingError with an informative message.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

rename_header(header: int | str, new_header_name: str) None[source]

Renames a header in the SQLDataModel at the specified index or using the old header name into the new value provided in new_header_name.

Parameters:
  • header (int|str) – The index or current str value of the header to be renamed.

  • new_header_name (str) – The new name as a str value for the specified header.

Raises:
  • TypeError – If the header or new_header_name parameters are invalid types.

  • IndexError – If the provided header index is outside the current column range.

  • SQLProgrammingError – If there is an issue with the SQL execution during the header renaming.

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the model with sample data
df = sdm.SQLDataModel(data, headers)

# Example: Rename the column at index 1 to 'first_name'
df.rename_column(1, 'first_name')

# Get current values
new_headers = df.get_headers()

# Outputs ['first_name', 'last', 'age']
print(new_headers)

Note

  • The method allows renaming a column identified by its index in the SQLDataModel.

  • Handles negative indices by adjusting them relative to the end of the column range.

  • If an error occurs during SQL execution, it rolls back the changes and raises a SQLProgrammingError with an informative message.

Changelog:
rename_headers(new_headers: list[str] | Callable[[list[str]], list[str]]) None[source]

Renames the current SQLDataModel headers to values provided in new_headers. Headers must match the existing column count.

Parameters:

new_headers (list[str] or Callable[[list[str]], list[str]]) – A sequence (e.g., list, tuple) of new header names or a callable that takes the existing headers and returns a new list of header names.

Raises:
  • TypeError – If the new_headers type is not a valid type (list or tuple) or contains instances that are not of type ‘str’.

  • DimensionError – If the length of new_headers does not match the column count.

  • TypeError – If the type of the first element in new_headers is not a valid type (str, int, or float).

Returns:

None

Example:

import sqldatamodel as sdm

# Create model
df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Set new headers
df.rename_headers(['First_Name', 'Last_Name', 'Payment'])

# Alternatively, provide a callable argument to transform headers using existing names
df.rename_headers(lambda headers: [header.replace(' ', '_') for header in headers])
Changelog:
replace(pattern: str, replacement: str, inplace: bool = False, **kwargs) SQLDataModel[source]

Replaces matching occurrences of a specified pattern with a replacement value in the SQLDataModel instance. If inplace is True, the method updates the existing SQLDataModel; otherwise, it returns a new SQLDataModel with the replacements applied.

Parameters:
  • pattern (str) – The substring or regular expression pattern to search for in each column.

  • replacement (str) – The string to replace the matched pattern with.

  • inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.

  • **kwargs – Additional keyword arguments to be passed to the execute_fetch method when not in-place.

Raises:

TypeError – If the pattern or replacement parameters are invalid types.

Returns:

If inplace=True, modifies the current instance in-place and returns None. Otherwise, returns a new SQLDataModel with the specified replacements applied.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

headers = ['first', 'last', 'age', 'service']
data = [
    ('John', 'Smith', 27, 1.22),
    ('Sarah', 'West', 39, 0.7),
    ('Mike', 'Harlin', 36, 3),
    ('Pat', 'Douglas', 42, 11.5)
]

# Create the model
df = sdm.SQLDataModel(data, headers,display_float_precision=2, display_index=False)

# Replace 'John' in the 'first' column
df['first'] = df['first'].replace("John","Jane")

# View model
print(df)

This will output:

┌───────┬─────────┬──────┬─────────┐
│ first │ last    │  age │ service │
├───────┼─────────┼──────┼─────────┤
│ Jane  │ Smith   │   27 │    1.22 │
│ Sarah │ West    │   39 │    0.70 │
│ Mike  │ Harlin  │   36 │    3.00 │
│ Pat   │ Douglas │   42 │   11.50 │
└───────┴─────────┴──────┴─────────┘
[4 rows x 4 columns]

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

reset_index(start_index: int = 0) None[source]

Resets the index of the SQLDataModel instance inplace to zero-based sequential autoincrement, or to specified start_index base with sequential incrementation.

Parameters:

start_index (int, optional) – The starting index for the reset operation. Defaults to 0.

Raises:
  • TypeError – If provided start_index argument is not of type int

  • ValueError – If the specified start_index is greater than the minimum index in the current model.

  • SQLProgrammingError – If reset index execution results in constraint violation or programming error.

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age', 'service']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# View current state
print(df)

This will output:

┌─────┬────────┬─────────┬────────┬─────────┐
│     │ first  │ last    │    age │ service │
├─────┼────────┼─────────┼────────┼─────────┤
│ 994 │ john   │ smith   │     27 │    1.22 │
│ 995 │ sarah  │ west    │     39 │    0.70 │
│ 996 │ mike   │ harlin  │     36 │    3.00 │
│ 997 │ pat    │ douglas │     42 │   11.50 │
└─────┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]

Now reset the index column:

# Reset the index with default start value
df.reset_index()

# View updated model
print(df)

This will output:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john   │ smith   │     27 │    1.22 │
│ 1 │ sarah  │ west    │     39 │    0.70 │
│ 2 │ mike   │ harlin  │     36 │    3.00 │
│ 3 │ pat    │ douglas │     42 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]

Reset the index to a custom value:

# Reset the index with a different value
df.reset_index(start_index = -3)

# View updated model
print(df)

This will output:

┌────┬────────┬─────────┬────────┬─────────┐
│    │ first  │ last    │    age │ service │
├────┼────────┼─────────┼────────┼─────────┤
│ -3 │ john   │ smith   │     27 │    1.22 │
│ -2 │ sarah  │ west    │     39 │    0.70 │
│ -1 │ mike   │ harlin  │     36 │    3.00 │
│  0 │ pat    │ douglas │     42 │   11.50 │
└────┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]

Note

  • The current index should be viewed more as a soft row number, to assign hard indicies use SQLDataModel.freeze_index() method.

  • Setting start_index too a very large negative or positive integer made lead to unpredictable behavior.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

row_count[source]

The current row count of the model.

Type:

int

sample(n_samples: float | int = 0.05, **kwargs) SQLDataModel[source]

Return a random sample of size n_samples as a new SQLDataModel.

Parameters:

n_samples (float | int) – Number of rows or proportion of rows to sample. Default set to 0.05, proportional to 5% of the current SQLDataModel.row_count. If n_samples is an integer, it represents the exact number of rows to sample where 0 < n_samples <= row_count. If n_samples is a float, it represents the proportion of rows to sample where 0.0 < n_samples <= 1.0.

Returns:

A new SQLDataModel instance containing the sampled rows.

Return type:

SQLDataModel

Raises:
  • TypeError – If the n_samples parameter is not of type ‘int’ or ‘float’.

  • ValueError – If the n_samples value is invalid or out of range.

This method generates a random sample of rows from the current SQLDataModel. The number of rows to sample can be specified either as an integer representing the exact number of rows or as a float representing the proportion of rows to sample. The sampled rows are returned as a new SQLDataModel instance.

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Amount'])

# Example 1: Sample 10 random rows
sample_result = df.sample(n_samples=10)

# Create the model
df2 = sdm.from_csv('another_example.csv', headers=['Code', 'Description', 'Price'])

# Example 2: Sample 20% of rows
sample_result2 = df2.sample(n_samples=0.2)

Note

  • If the current model’s SQLDataModel.row_count value is less than the sample size, the current row count will be used instead.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

set_column_alignment(alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic') None[source]

Sets the default alignment behavior for SQLDataModel when repr or print is called, modifies column_alignment attribute. Default behavior set to 'dynamic', which right-aligns numeric data types, left-aligns all other types, with headers matching value alignment.

Parameters:

alignment (str) – The column alignment setting to use. 'dynamic': Default behavior, dynamically aligns columns based on column data types. 'left': Left-align all column values. 'center': Center-align all column values. 'right': Right-align all column values.

Raises:
  • TypeError – If the argument for alignment is not of type ‘str’.

  • ValueError – If the provided alignment is not one of ‘dynamic’, ‘left’, ‘center’, ‘right’.

Returns:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Set to right-align columns
df.set_column_alignment('right')

# Output
print(df)

This will output the model with values right-aligned:

┌───┬────────┬─────────┬────────┬─────────┐
│   │  first │    last │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │   john │   smith │     27 │    1.22 │
│ 1 │  sarah │    west │     39 │    0.70 │
│ 2 │   mike │  harlin │     36 │    3.00 │
│ 3 │    pat │ douglas │     42 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘

Setting columns to be left-aligned:

# Set to left-align
sdm.set_column_alignment('left')

# Output
print(df)

This will output the model with left-aligned values instead:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │ age    │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john   │ smith   │  27    │  1.22   │
│ 1 │ sarah  │ west    │  39    │  0.70   │
│ 2 │ mike   │ harlin  │  36    │  3.00   │
│ 3 │ pat    │ douglas │  42    │  11.50  │
└───┴────────┴─────────┴────────┴─────────┘

Note

  • Use SQLDataModel.get_column_alignment() to return the current column alignment setting.

  • When using ‘center’, if the column contents cannot be perfectly centralized, the left side will be favored.

  • Use ‘dynamic’ to return to default column alignment, which is right-aligned for numeric types and left-aligned for others.

  • See SQLDataModel.set_table_style() for modifying table format and available styles.

Changelog:
  • Version 0.1.80 (2024-02-24):
    • Changed expected values for alignment parameter from f-string modifiers to more descriptive values ‘dynamic’, ‘left’, ‘center’ or ‘right’.

set_column_dtypes(column: str | int | dict, dtype: Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str'] = None) None[source]

Casts the specified column into the provided python dtype using the equivalent SQL data type.

Parameters:
  • column (str or int or dict) – The name or index of the column to be cast, or a dictionary mapping column names to dtypes. If a dictionary, keys are column names or indices and values are the dtypes.

  • dtype (Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) – The target Python data type for the specified column. Ignored if column is a dictionary.

Raises:
  • TypeError – If column is not of type ‘str’, ‘int’, or ‘dict’, or if any dtype is invalid.

  • IndexError – If column is an integer and the index is outside of the current model range.

  • ValueError – If column is a string and the column is not found in the current model.

Returns:

The model’s data types are successfully casted to the new type and nothing is returned.

Return type:

None

Example:

import sqldatamodel as sdm

# Sample data
headers = ['idx', 'First', 'Last', 'Age']
data = [
    (0, 'John', 'Smith', 27)
    (1, 'Sarah', 'West', 29),
    (2, 'Mike', 'Harlin', 36),
    (3, 'Pat', 'Douglas', 42),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Original dtype for comparison
old_dtype = df.get_column_dtypes('Age')

# Set the data type of the 'Age' column to 'float'
df.set_column_dtypes('Age', 'float')

# Confirm column dtype
new_dtype = df.get_column_dtypes('Age')

# View result
print(f"Age dtype: {old_dtype} -> {new_dtype}")

This will output:

Age dtype: int -> float

Warning

  • Type casting will coerce any nonconforming values to the dtype being set, this means data will be lost if casted incorrectly.

Note

  • Column data types are mapped to SQL types and not Python class types, see sqlite3 docs for additional information.

  • See SQLDataModel.infer_dtypes() to automatically infer the correct column data types using random sampling.

Changelog:
  • Version 0.7.9 (2024-06-20):
    • Modified to allow column argument to be provided as a dictionary mapping column names to dtypes to reflect current structure at SQLDataModel.dtypes.

  • Version 0.1.9 (2024-03-19):
    • New method.

set_display_color(color: str | tuple = None, rand_color: bool = False) None[source]

Sets the table string representation color when SQLDataModel is displayed in the terminal, selecting a random color if rand_color = True.

Parameters:
  • color (str or tuple) – Color to set. Accepts hex value (e.g., '#A6D7E8') or tuple of RGB values e.g., (166, 215, 232). When not provided or color = None, the terminal default color will be used.

  • rand_color (bool, optional) – Set a random color from a preselected pool of options. When True color will be ignored and a random color selected instead.

Returns:

The color value is set at SQLDataModel.display_color and nothing is returned.

Return type:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['Name', 'Age', 'Salary'])

# Set color using hex value
df.set_display_color('#A6D7E8')

# Set color using rgb value
df.set_display_color((166, 215, 232))

If you’re unsure of which color to use, have one selected for you:

# Surprise me! Use a random color
df.set_display_color(rand_color=True)

# View the value set
print(df.display_color)

In this case we got a nice ‘plum’ color:

ANSIColor('#F6A8CC')

To reset to the default terminal color:

# Set color to None
df.set_display_color(color=None)

# View the value set
print(df.display_color) # None

This will return None, signifying the default terminal color will be used.

Note

  • By default, no color styling is applied and the native terminal color is used.

  • To use rgb values, ensure a single tuple is provided as an argument.

  • When rand_color = True a random color is selected from a preexisting pool, see SQLDataModel.ANSIColor.ANSIColor.rand_color() for more details.

Changelog:
  • Version 0.12.0 (2024-07-06):
    • Added rand_color argument to require explicit selection for random color and return color = None to instead reset color to terminal default.

  • Version 0.10.2 (2024-06-30):
    • Modified to randomly select a color from preselected pool when color = None for demonstration purposes, see SQLDataModel.ANSIColor for more details.

  • Version 0.7.0 (2024-06-08):
    • Removed warning message and modified to raise exception on failure to create display color pen.

  • Version 0.1.5 (2023-11-24):
    • New method.

set_display_float_precision(float_precision: int) None[source]

Sets the current float display precision to the specified value for use in the repr method of the SQLDataModel when representing float data types. Note that this precision limit is overridden by the max_column_width value if the precision limit exceeds the specified maximum width.

Parameters:

float_precision (int) – The desired float display precision to be used for real number values.

Raises:
  • TypeError – If the float_precision argument is not of type ‘int’.

  • ValueError – If the float_precision argument is a negative value, as it must be a valid f-string precision identifier.

Returns:

None

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age', 'service_time']
data = [
    (0, 'john', 'smith', 27, 1.22)
    ,(1, 'sarah', 'west', 0.7)
    ,(2, 'mike', 'harlin', 3)
    ,(3, 'pat', 'douglas', 11.5)
]

# Create the model with sample data
df = sdm.SQLDataModel(data,headers)

# Example: Set the float display precision to 2
df.set_display_float_precision(2)

# View model
print(df)

This will output:

┌───┬────────┬─────────┬────────┬──────────────┐
│   │ first  │ last    │    age │ service_time │
├───┼────────┼─────────┼────────┼──────────────┤
│ 0 │ john   │ smith   │     27 │         2.10 │
│ 1 │ sarah  │ west    │     29 │         0.80 │
│ 2 │ mike   │ harlin  │     36 │         1.30 │
│ 3 │ pat    │ douglas │     42 │         7.02 │
└───┴────────┴─────────┴────────┴──────────────┘

Use SQLDataModel.get_display_float_precision() to get the current value set:

# Get the updated float display precision
updated_precision = sdm.get_display_float_precision()

# Outputs 2
print(updated_precision)

Note

  • The display_float_precision attribute only affects the precision for displaying real or floating point values.

  • The actual precision of the stored value in the model is unaffected by the value set.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

set_display_index(display_index: bool) None[source]

Sets the value for SQLDataModel.display_index to enable or disable the inclusion of the SQLDataModel index value in print or repr calls.

Parameters:

display_index (bool) – Whether or not to include the index in SQLDataModel representations.

Raises:

TypeError – If the provided argument is not a boolean value.

Returns:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Disable displaying index
df.set_display_index(False)

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

set_display_max_rows(rows: int | None) None[source]

Sets value at SQLDataModel.display_max_rows to limit maximum rows displayed when repr or print is called. Use rows = None to derive max number to display from the current terminal height.

Parameters:

rows (int) – The maximum number of rows to display.

Raises:
  • TypeError – If the provided argument is not None or is not an integer.

  • IndexError – If the provided value is an integer less than or equal to 0.

Returns:

None

Example:

import sqldatamodel as sdm

# Create model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Any call to `print` or `repr` will be restricted to 500 max rows
df.set_display_max_rows(500)

# Alternatively, auto-detect dimensions by setting to `None`
df.set_display_max_rows(None)

Note

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

set_headers(new_headers: list[str] | dict[str | int, str] | Callable[[list[str]], list[str]]) None[source]

Renames the current SQLDataModel headers to values in new_headers, which can be provided as:

  1. A sequence (list/tuple) of strings matching the full column count.

  2. A dictionary mapping existing indices (int) or names (str) to new names (str).

  3. A callable that transforms the existing header list.

Parameters:

new_headers (list[str] | dict[str|int, str] | Callable) – The new header configuration. - If list/tuple: Must be same length as column count. - If dict: Keys can be current column names or indices (supporting negative indexing). Values are new names. Unspecified columns remain unchanged. - If callable: Receives current headers list, returns new headers list.

Raises:
  • TypeError – If new_headers contains invalid types.

  • DimensionError – If the length of the resulting header list does not match the column count.

  • IndexError – If a dictionary key is an integer index out of bounds.

  • ValueError – If a dictionary key is a string name not found in current headers.

  • SQLProgrammingError – If an error is encountered while attempting to rename the SQL columns.

Returns:

None

Example:

import sqldatamodel as sdm

# Create model
df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# 1. Rename all using list
df.set_headers(['First_Name', 'Last_Name', 'Payment'])

# 2. Rename specific columns using dict (mixed keys allowed)
# Rename 'Payment' (index -1) to 'Annual_Salary' and 'First_Name' to 'FName'
df.set_headers({-1: 'Annual_Salary', 'First_Name': 'FName'})

# 3. Transform using callable
df.set_headers(lambda headers: [h.upper() for h in headers])
Changelog:
  • Version 2.2.0 (2025-11-28):
    • Added support for dict input to allow partial renaming and mixed index/name keys.

  • Version 2.1.1 (2025-11-25):
    • Modified to retain original column ordering when existing model headers match a subset of new_headers.

  • Version 1.2.0 (2025-01-28):
    • Added ability to provide a callable for new_headers.

  • Version 0.1.5 (2023-11-24):
    • New method.

set_max_column_width(width: int) None[source]

Set max_column_width as the maximum number of characters per column when repr or print is called.

Parameters:

width (int) – The maximum width for each column.

Returns:

Sets the max_column_width property.

Return type:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Change the max column width for the table representation
df.set_max_column_width(20)

Note

  • If max_column_width is set to a value below the current min_column_width property, the maximum width will override the minimum width.

  • The minimum required width is 2, when max_column_width < 2, 2 will be used regardless of the width provided.

  • See SQLDataModel.set_min_column_width() to set minimum column width for table representations.

set_min_column_width(width: int) None[source]

Set min_column_width as the minimum number of characters per column when repr or print is called.

Parameters:

width (int) – The minimum width for each column.

Returns:

Sets the min_column_width property.

Return type:

None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Set a new minimum column width value
df.set_min_column_width(8)

# Check updated value
print(df.min_column_width) # 8

Note

  • If min_column_width is set to a value below the current max_column_width property, the maximum width will override the minimum width.

  • The minimum required width is 2, when min_column_width < 2, 2 will be used regardless of the width provided.

  • See SQLDataModel.set_max_column_width() to set maximum column width for table representations.

set_model_name(new_name: str) None[source]

Sets the new SQLDataModel table name that will be used as an alias for any SQL queries executed by the user or internally.

Parameters:

new_name (str) – The new table name for the SQLDataModel.

Raises:

SQLProgrammingError – If unable to rename the model table due to SQL execution failure.

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.from_csv('example.csv', headers=['Column1', 'Column2'])

# Rename the model
df.set_model_name('custom_table')

Note

  • The provided value must be a valid SQL table name.

  • This alias will be reset to the default value for any new SQLDataModel instances: 'sdm'.

Changelog:
  • Version 0.1.5 (2023-11-24):
    • New method.

set_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = 'default') None[source]

Sets the table style used for string representations of SQLDataModel.

Parameters:

style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple']) – The table styling to set. Setting to 'default' style will return the style representation to the original format.

Raises:

ValueError – If style provided is not one of the currently supported options ‘ascii’, ‘bare’, ‘dash’, ‘default’, ‘double’, ‘latex’, ‘list’, ‘markdown’, ‘outline’, ‘pandas’, ‘polars’, ‘postgresql’ or ‘round’.

Returns:

None

Examples:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height', 'Birthday']
data = [
    ('Alice', 28, 162.08, '1996-11-20'),
    ('Bobby', 30, 175.36, '1994-06-15'),
    ('Craig', 37, 185.82, '1987-01-07'),
    ('David', 32, 179.75, '1992-12-28')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Lets try the round style
df.set_table_style('round')

# View it
print(df)

This outputs the 'round' table style:

╭───────┬─────┬─────────┬────────────╮
│ Name  │ Age │  Height │ Birthday   │
├───────┼─────┼─────────┼────────────┤
│ Alice │  28 │  162.08 │ 1996-11-20 │
│ Bobby │  30 │  175.36 │ 1994-06-15 │
│ Craig │  37 │  185.82 │ 1987-01-07 │
│ David │  32 │  179.75 │ 1992-12-28 │
╰───────┴─────┴─────────┴────────────╯

Alternatively, set style = 'ascii' to format SQLDataModel in the ASCII style, the OG of terminal tables:

+-------+-----+---------+------------+
| Name  | Age |  Height | Birthday   |
+-------+-----+---------+------------+
| Alice |  28 |  162.08 | 1996-11-20 |
| Bobby |  30 |  175.36 | 1994-06-15 |
| Craig |  37 |  185.82 | 1987-01-07 |
| David |  32 |  179.75 | 1992-12-28 |
+-------+-----+---------+------------+

Set style = 'bare' to format SQLDataModel in the following style:

Name   Age   Height  Birthday
-------------------------------
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28

Set style = 'dash' to format SQLDataModel with dashes for internal borders:

┌───────┬─────┬─────────┬────────────┐
│ Name  ╎ Age ╎  Height ╎ Birthday   │
├╴╴╴╴╴╴╴┼╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴╴╴╴┤
│ Alice ╎  28 ╎  162.08 ╎ 1996-11-20 │
│ Bobby ╎  30 ╎  175.36 ╎ 1994-06-15 │
│ Craig ╎  37 ╎  185.82 ╎ 1987-01-07 │
│ David ╎  32 ╎  179.75 ╎ 1992-12-28 │
└───────┴─────┴─────────┴────────────┘

Set style = 'default' to format SQLDataModel in the following style, which also happens to be the default styling applied:

┌───────┬─────┬─────────┬────────────┐
│ Name  │ Age │  Height │ Birthday   │
├───────┼─────┼─────────┼────────────┤
│ Alice │  28 │  162.08 │ 1996-11-20 │
│ Bobby │  30 │  175.36 │ 1994-06-15 │
│ Craig │  37 │  185.82 │ 1987-01-07 │
│ David │  32 │  179.75 │ 1992-12-28 │
└───────┴─────┴─────────┴────────────┘
[4 rows x 4 columns]

Set style = 'list' to format SQLDataModel as a list of values, similar to the SQLite CLI representation:

Name   Age   Height  Birthday
-----  ---  -------  ----------
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28

Set style = 'double' to format SQLDataModel using double line borders:

╔═══════╦═════╦═════════╦════════════╗
║ Name  ║ Age ║  Height ║ Birthday   ║
╠═══════╬═════╬═════════╬════════════╣
║ Alice ║  28 ║  162.08 ║ 1996-11-20 ║
║ Bobby ║  30 ║  175.36 ║ 1994-06-15 ║
║ Craig ║  37 ║  185.82 ║ 1987-01-07 ║
║ David ║  32 ║  179.75 ║ 1992-12-28 ║
╚═══════╩═════╩═════════╩════════════╝

Set style = 'markdown' to format SQLDataModel in the Markdown style:

| Name  | Age |  Height | Birthday   |
|-------|-----|---------|------------|
| Alice |  28 |  162.08 | 1996-11-20 |
| Bobby |  30 |  175.36 | 1994-06-15 |
| Craig |  37 |  185.82 | 1987-01-07 |
| David |  32 |  179.75 | 1992-12-28 |

Set style = 'outline' to format SQLDataModel in the following style:

┌─────────────────────────────────┐
│ Name   Age   Height  Birthday   │
├─────────────────────────────────┤
│ Alice   28   162.08  1996-11-20 │
│ Bobby   30   175.36  1994-06-15 │
│ Craig   37   185.82  1987-01-07 │
│ David   32   179.75  1992-12-28 │
└─────────────────────────────────┘

Set style = 'pandas' to format SQLDataModel in the style used by Pandas DataFrames:

Name   Age   Height  Birthday
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28

Set style = 'polars' to format SQLDataModel in the style used by Polars DataFrames:

┌───────┬─────┬─────────┬────────────┐
│ Name  ┆ Age ┆  Height ┆ Birthday   │
╞═══════╪═════╪═════════╪════════════╡
│ Alice ┆  28 ┆  162.08 ┆ 1996-11-20 │
│ Bobby ┆  30 ┆  175.36 ┆ 1994-06-15 │
│ Craig ┆  37 ┆  185.82 ┆ 1987-01-07 │
│ David ┆  32 ┆  179.75 ┆ 1992-12-28 │
└───────┴─────┴─────────┴────────────┘

Set style = 'postgresql' to format SQLDataModel in the style used by PostgreSQL:

Name  | Age |  Height | Birthday
------+-----+---------+-----------
Alice |  28 |  162.08 | 1996-11-20
Bobby |  30 |  175.36 | 1994-06-15
Craig |  37 |  185.82 | 1987-01-07
David |  32 |  179.75 | 1992-12-28

Set style = 'rst-grid' to format SQLDataModel in the style required for Sphinx and reStructured text grid tables:

+-------+-----+---------+------------+
| Name  | Age |  Height | Birthday   |
+=======+=====+=========+============+
| Alice |  28 |  162.08 | 1996-11-20 |
| Bobby |  30 |  175.36 | 1994-06-15 |
| Craig |  37 |  185.82 | 1987-01-07 |
| David |  32 |  179.75 | 1992-12-28 |
+-------+-----+---------+------------+

Set style = 'rst-simple' to format SQLDataModel in the style required for Sphinx and reStructured simple tables:

=====  ===  =======  ==========
Name   Age   Height  Birthday
=====  ===  =======  ==========
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28
=====  ===  =======  ==========

Set style = 'latex' to format SQLDataModel in the style of a LaTeX table:

\hline
    Name  & Age &  Height & Birthday   \\
\hline
    Alice &  28 &  162.08 & 1996-11-20 \\
    Bobby &  30 &  175.36 & 1994-06-15 \\
    Craig &  37 &  185.82 & 1987-01-07 \\
    David &  32 &  179.75 & 1992-12-28 \\
\hline

However, SQLDataModel.to_latex() should be used to format complete table elements for LaTeX files.

Note

  • The labels given to certain styles are entirely subjective and do not in any way express original design or ownership of the styling used.

  • Legacy character sets on older terminals may not support all the character encodings required for some styles.

  • See SQLDataModel._generate_table_style() for implementation details related to each format.

Changelog:
  • Version 0.11.0 (2024-07-05):
    • Added style 'latex' to generate LaTeX style tables.

  • Version 0.9.3 (2024-06-28):
    • Added styles 'rst-grid' and 'rst-simple' to allow SQLDataModel to generate table formats used by Sphinx and reStructured Text.

  • Version 0.3.11 (2024-04-18):
    • Removed 'thick' style and added 'list' style for greater variety of available formats.

  • Version 0.3.8 (2024-04-12):
    • New method.

shape[source]

The current dimensions of the model as a tuple of (rows, columns).

Type:

tuple[int, int]

sort(by: str | int | Iterable[str | int] = None, asc: bool = True) SQLDataModel[source]

Sort columns in the dataset by the specified ordering. If no value is specified, the current SQLDataModel.sql_idx column is used with the default ordering asc = True.

Parameters:
  • by (str | int | Iterable[str | int], optional) – The column or list of columns by which to sort the dataset. Defaults to sorting by the dataset’s index.

  • asc (bool, optional) – If True, sort in ascending order; if False, sort in descending order. Defaults to ascending order.

Raises:
  • TypeError – If value for by argument is not one of type ‘str’, ‘int’ or ‘list’.

  • ValueError – If a specified column in by is not found in the current dataset or is an invalid column.

  • IndexError – If columns are indexed by integer but are outside of the current model range.

Returns:

A new instance of SQLDataModel with columns sorted according to the specified ordering.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Sort by last name column
sorted_df = df.sort('last')

# View sorted model
print(sorted_df)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┐
│   │ first │ last    │  age │ service │ hire_date  │
├───┼───────┼─────────┼──────┼─────────┼────────────┤
│ 0 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │
│ 1 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │
│ 2 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │
│ 3 │ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │
│ 4 │ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │
└───┴───────┴─────────┴──────┴─────────┴────────────┘
[5 rows x 5 columns]

Sort by multiple columns:

# Sort by multiple columns in descending order
sorted_df = df.sort(['age','hire_date'], asc=False)

# View sorted
print(sorted_df)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┐
│   │ first │ last    │  age │ service │ hire_date  │
├───┼───────┼─────────┼──────┼─────────┼────────────┤
│ 0 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │
│ 1 │ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │
│ 2 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │
│ 3 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │
│ 4 │ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │
└───┴───────┴─────────┴──────┴─────────┴────────────┘
[5 rows x 5 columns]

Note

  • Standard sorting process for sqlite3 is used, whereby the ordering prefers the first column mentioned to the last.

  • Ascending and descending ordering follows this order of operations for multiple columns as well.

Changelog:
  • Version 0.8.0 (2024-06-21):
    • Modified to allow mixed integer and value indexing for columns sort order in by argument to reflect similar flexibility for column input across package.

  • Version 0.5.1 (2024-05-10):
    • Modified to allow integer indexing for column sort order in by argument.

  • Version 0.1.9 (2024-03-19):
    • New method.

sql_db_conn[source]

The in-memory sqlite3 connection object in use by the model.

Type:

sqlite3.Connection

sql_idx[source]

The index column name applied to the sqlite3 in-memory representation of the model. Default is 'idx'

Type:

str

sql_model[source]

The table name applied to the sqlite3 in-memory representation of the model. Default is 'sdm'

Type:

str

startswith(pat: str | Iterable[str], case: bool = True) set[int][source]

Return the row indices that start with the specified pattern(s) in any column from the model, converting to str(value) for comparison.

Parameters:
  • pat (str | Iterable[str]) – The pattern or iterable of patterns to search for within the data.

  • case (bool, optional) – If True (default), the search is case-sensitive. If False, the search is case-insensitive.

Raises:

TypeError – If argument for pat is not of type ‘str’ or an iterable of type ‘str’ representing the substring pattern(s).

Returns:

Set of row indices containing values that match the pattern(s).

Return type:

set[int]

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Sex', 'City']
data = [
    ('Mike', 31, 'M', 'Chicago'),
    ('John', 25, 'M', 'Dayton'),
    ('Alice', 27, 'F', 'Boston'),
    ('Sarah', 35, 'F', 'Houston'),
    ('Bobby', 42, 'M', 'Chicago'),
    ('Steve', 28, 'F', 'Austin'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter for rows where any column starts with the string 'Chi'
matching_indices = df['City'].startswith('Chi')

# Apply filter to model
df_city = df[matching_indices]

# View result
print(df_city)

This will output the result of applying the filter to the model:

┌───┬───────┬─────┬─────┬─────────┐
│   │ Name  │ Age │ Sex │ City    │
├───┼───────┼─────┼─────┼─────────┤
│ 0 │ Mike  │  31 │ M   │ Chicago │
│ 4 │ Bobby │  42 │ M   │ Chicago │
└───┴───────┴─────┴─────┴─────────┘
[2 rows x 4 columns]

Instead of searching a single column, the entire model can be searched:

# Method can also search all columns, and be applied directly
df_prefix = df[df.startswith('A', case=False)]

# View result
print(df_prefix)

This will output the result of a case-insensitive search:

┌───┬───────┬─────┬─────┬─────────┐
│   │ Name  │ Age │ Sex │ City    │
├───┼───────┼─────┼─────┼─────────┤
│ 2 │ Alice │  27 │ F   │ Boston  │
│ 5 │ Steve │  28 │ F   │ Austin  │
└───┴───────┴─────┴─────┴─────────┘
[2 rows x 4 columns]

This can be used in combination with the setitem syntax to selectively update values as well:

# Create a 'State' column with a default value
df['State'] = None

# Filter and set the values that start with the pattern
df[df.startswith('Chi'), 'State'] = 'Illinois'

# Multiple conditions can be used
tx_1 = df.startswith('Hou')
tx_2 = df.startswith('Aus')

# Then chained together using set notation
df[(tx_1 | tx_2), 'State'] = 'Texas'

# Alternatively, an iterable of patterns can be provided
df[df.startswith(['Hou','Aus']), 'State'] = 'Texas'

Note

Changelog:
  • Version 0.7.8 (2024-06-18):
    • New method.

static_py_to_sql_map_dict[source]

The data type mapping to use when converting python types to SQL column types.

Type:

dict

static_sql_to_py_map_dict[source]

The data type mapping to use when converting SQL column types to python types.

Type:

dict

strip(characters: str = None, str_dtype_only: bool = True, inplace: bool = False) SQLDataModel | None[source]

Removes the specified characters from the beginning and end of each value in the current SQLDataModel removing leading and trailing whitespace characters by default.

Parameters:
  • characters (str, optional) – The characters to remove from both ends of the value. Default is None, removing whitespace (' ', '\t', '\n', '\r').

  • str_dtype_only (bool, optional) – If True, only columns with dtype = ‘str’ are stripped, otherwise all columns are stripped. Default is True.

  • inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.

Raises:

TypeError – If characters argument is provided and is not of type 'str' representing unordered characters to remove.

Returns:

If inplace=False, returns a new SQLDataModel with the stripped values. Otherwise modifies the current instance in-place returning None.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create a single item model
df = sdm.SQLDataModel([[' Hello, World! ']])

# Strip whitespace and print
print(df.strip())

This will output the model after stripping the leading and trailing whitespace characters:

┌───┬───────────────┐
│   │ 0             │
├───┼───────────────┤
│ 0 │ Hello, World! │
└───┴───────────────┘
[1 rows x 1 columns]

Non-whitespace characters can also be stripped:

import sqldatamodel as sdm

headers = ['Col A', 'Col B', 'Col C']
data = [
    ['A1', 'B1', 'C1'],
    ['A2', 'B2', 'C2'],
    ['A3', 'B3', 'C3']
]

# Create the sample model
df = sdm.SQLDataModel(data, headers)

# Strip leading and trailing 'A' character
df_stripped = df.strip('A')

# View result
print(df_stripped)

This will output a new model where any leading and trailing ‘A’ characters have been removed:

┌───────┬───────┬───────┐
│ Col A │ Col B │ Col C │
├───────┼───────┼───────┤
│ 1     │ B1    │ C1    │
│ 2     │ B2    │ C2    │
│ 3     │ B3    │ C3    │
└───────┴───────┴───────┘
[3 rows x 3 columns]

Multiple characters can be stripped, and the model modified inplace:

# Strip multiple characters and this time modify model inplace
df.strip('123', inplace=True)

# View result
print(df)

This will output the modified model after stripping leading and trailing ‘123’ characters:

┌───────┬───────┬───────┐
│ Col A │ Col B │ Col C │
├───────┼───────┼───────┤
│ A     │ B     │ C     │
│ A     │ B     │ C     │
│ A     │ B     │ C     │
└───────┴───────┴───────┘
[3 rows x 3 columns]

Note

  • For string replacement instead of string removal, see SQLDataModel.replace().

  • When using str_dtype_only = False, numeric values may be modified due to SQLite’s type affinity rules.

  • This method is equivalent to the SQLite trim(string, character) function, wrapping and passing the equivalent arguments.

Changelog:
  • Version 0.4.3 (2024-05-07):
    • New method.

table_style[source]

The table style used for string representations of the model. Available styles are 'ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql' or 'round'. Defaults to 'default' table style.

Type:

str

tail(n_rows: int = 5) SQLDataModel[source]

Returns the last n_rows of the current SQLDataModel.

Parameters:

n_rows (int, optional) – Number of rows to return. Defaults to 5.

Raises:

TypeError – If n_rows argument is not of type ‘int’ representing the number of rows to return from the tail of the model.

Returns:

A new SQLDataModel instance containing the specified number of rows.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Countries data available for sample dataset
url = 'https://developers.google.com/public-data/docs/canonical/countries_csv'

# Create the model
df = sdm.from_html(url)

# Get tail of model
df_tail = df.tail()

# View it
print(df_tail)

This will grab the bottom 5 rows by default:

┌─────┬─────────┬──────────┬───────────┬───────────────────┐
│     │ country │ latitude │ longitude │ name              │
├─────┼─────────┼──────────┼───────────┼───────────────────┤
│ 240 │ WF      │ -13.7688 │ -177.1561 │ Wallis and Futuna │
│ 241 │ EH      │  24.2155 │  -12.8858 │ Western Sahara    │
│ 242 │ YE      │  15.5527 │   48.5164 │ Yemen             │
│ 243 │ ZM      │ -13.1339 │   27.8493 │ Zambia            │
│ 244 │ ZW      │ -19.0154 │   29.1549 │ Zimbabwe          │
└─────┴─────────┴──────────┴───────────┴───────────────────┘
[5 rows x 4 columns]

Note

  • See related SQLDataModel.head() for the opposite, grabbing the top n_rows from the current model.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

to_csv(filename: str = None, delimiter: str = ',', quotechar: str = '"', lineterminator: str = '\r\n', na_rep: str = 'None', encoding: str = 'utf-8', index: bool = False, **kwargs) str | None[source]

Writes SQLDataModel to the specified file if filename argument if provided, otherwise returns the model directly as a CSV formatted string literal.

Parameters:
  • filename (str) – The name of the CSV file to which the data will be written. Default is None, returning as raw literal.

  • delimiter (str, optional) – The delimiter to use for separating values. Default is ‘,’.

  • quotechar (str, optional) – The character used to quote fields. Default is ‘”’.

  • lineterminator (str, optional) – The character used to terminate the row and move to a new line. Default is ‘rn’.

  • na_rep (str, optional) – String representation to use for null or missing values. Default is ‘None’.

  • encoding (str, optional) – The encoding to use when writing the model to a CSV file. Default is ‘utf-8’.

  • index (bool, optional) – If True, includes the index in the CSV file; if False, excludes the index. Default is False.

  • **kwargs – Additional arguments to be passed to the csv.writer constructor.

Returns:

If filename is None, returns the model as a delimited string literal, None if filename is provided, writing the model to the specified file as a CSV file.

Return type:

str | None

Example:

Returning CSV Literal
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Generate the literal using tab delimiter
csv_literal = df.to_csv(delimiter='\t')

# View output
print(csv_literal)

This will output:

Name    Age     Height
John    30      175.3
Alice   28      162.0
Travis  35      185.8
Write to File
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# CSV filename
csv_file = 'persons.csv'

# Write to the file, keeping the index
df.to_csv(filename=csv_file, index=True)

Contents of persons.csv:

idx,Name,Age,Height
0,John,30,175.3
1,Alice,28,162.0
2,Travis,35,185.8

Note

  • When index=True, the sdm_index property determines the column name of the index in the result.

  • Modifying delimiter affects how the data is delimited when writing to filename and when returning as raw literal, any valid delimiter can be used.

  • Quoting behavior can be modified by providing an additional keywork arg such as quoting=1 to wrap all values in quotes, or quoting=2 to quote only non-numeric values, see csv.QUOTE_X enums for all options.

  • Use SQLDataModel.to_text() to pretty print table in specified style for visualizing output if strict delimiting is unnecessary.

  • See SQLDataModel.from_csv() for creating a new SQLDataModel from existing CSV data

Changelog:
  • Version 0.6.4 (2024-05-17):
    • Added encoding parameter to pass to file handler when writing contents as CSV file and set default to utf-8 to align with expected SQLite codec.

  • Version 0.4.0 (2024-04-23):
    • Modified quoting behavior to avoid redundant quoting and to closely mimic csv module from standard library.

    • Added na_rep to fill null or missing values when generating output, useful for space delimited data and minimal quoting.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

to_dict(orient: Literal['rows', 'columns', 'list'] = 'rows', index: bool = None) dict | list[dict][source]

Converts the SQLDataModel instance to a dictionary or a list of dictionaries based on the specified orientation.

Parameters:
  • orient (Literal["rows", "columns", "list"]) – The orientation of the output, see examples for more detail. "rows": Returns a dictionary with index values as keys and row values as values. "columns": Returns a dictionary with column names as keys and column values as tuples. "list": Returns a list of dictionaries, where each dictionary represents a row.

  • index (bool) – Whether to include the index column in the output. Defaults to the display_index property.

Raises:

ValueError – if value for orient is not one of “rows”, “columns” or “list”.

Returns:

The converted data structure based on the specified orientation.

Return type:

dict | list[dict]

Examples:

Orient by Rows
import sqldatamodel as sdm

# Sample data
headers = ['Col A','Col B', 'Col C']
data = [
    ['A,0', 'A,1', 'A,2'],
    ['B,0', 'B,1', 'B,2'],
    ['C,0', 'C,1', 'C,2']
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Convert to dictionary with rows as keys and values
rows_dict = df.to_dict(orient="rows")

# View output
for k, v in rows_dict.items():
    print(f"{k}: {v}")

This will output:

0: ('A,0', 'B,0', 'C,0')
1: ('A,1', 'B,1', 'C,1')
2: ('A,2', 'B,2', 'C,2')
Orient by Columns
# Convert to dictionary with columns as keys and rows as values
columns_dict = df.to_dict(orient="columns")

# View output
for k, v in columns_dict.items():
    print(f"{k}: {v}")

This will output:

Col A: ('A,0', 'A,1', 'A,2')
Col B: ('B,0', 'B,1', 'B,2')
Col C: ('C,0', 'C,1', 'C,2')
Orient by List
# Convert to list of dictionaries with each dictionary representing a row with columns as keys
list_dict = df.to_dict(orient="list")

# View output
for row in list_dict:
    print(row)

This will output:

{'Col A': 'A,0', 'Col B': 'B,0', 'Col C': 'C,0'}
{'Col A': 'A,1', 'Col B': 'B,1', 'Col C': 'C,1'}
{'Col A': 'A,2', 'Col B': 'B,2', 'Col C': 'C,2'}

Note

  • Use index to return index data, otherwise current instance display_index value will be used.

  • For 'list' orientation, data returned is JSON-like in structure, where each row has its own “column”: “value” data.

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.5 (2023-11-24):
    • New method.

to_excel(filename: str, worksheet: int | str = 1, index: bool = False, if_exists: Literal['append', 'replace', 'fail'] = 'replace') None[source]

Writes the current SQLDataModel to the specified Excel filename.

Parameters:
  • filename (str) – The file path to save the Excel file, e.g., filename = 'output.xlsx'.

  • worksheet (int | str, optional) – The index or name of the worksheet to write to. Defaults to 1, indicating the first worksheet.

  • index (bool, optional) – If SQLDataModel index should be included in the output. Default is False.

  • if_exists (Literal['append','replace','fail']) – Action to take if file already exists. Default is ‘replace’, overwriting existing file.

Raises:
  • ModuleNotFoundError – If the required package openpyxl is not installed as determined by optionals._has_xl flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid Excel file path to create or write to.

  • ValueError – If if_exists is not one of ‘append’, ‘replace’ or ‘fail’ representing action to take if file exists.

  • IndexError – If worksheet is provided as type ‘int’ but is out of range of the available worksheets.

  • Exception – If any unexpected exception occurs during the Excel writing and saving process.

Returns:

If successful, a new Excel file filename is created and None is returned.

Return type:

None

Example:

import openpyxl
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Rate', 'Gender']
data = [
    ('Alice', 25, 26.50, 'Female'),
    ('Bob', 30, 21.25, 'Male'),
    ('Will', 35, 24.00, 'Male'),
    ('Mary', 32, 23.75, 'Female')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Export into a new Excel file
df.to_excel('Team-Overview.xlsx')

# Or append to existing Excel file as a new worksheet
df.to_excel('Team.xlsx', worksheet='Demographics', if_exists='append')

This will create a new Excel file Team-Overview.xlsx:

    ┌───────┬──────┬────────┬────────┐
    │ A     │  B   │ C      │ D      │
┌───┼───────┼──────┼────────┼────────┤
│ 1 │ Name  │  Age │ Gender │   Rate │
│ 2 │ Alice │   25 │ Female │  26.50 │
│ 3 │ Mary  │   32 │ Female │  23.75 │
│ 4 │ Bobby │   30 │ Male   │  21.25 │
│ 5 │ Will  │   35 │ Male   │  24.00 │
└───┴───────┴──────┴────────┴────────┘
[ Sheet1 ]

Note

  • Headers are dynamically inserted based on value for if_exists, where using ‘replace’ will include headers and ‘append’ will ignore them unless worksheet creation occurred.

  • When providing a string argument for worksheet, if the sheet does not exist, it will be created. However if providing an integer index for an out of range sheet, an IndexError will be raised.

  • See related SQLDataModel.from_excel() for creating a SQLDataModel from existing Excel content.

Changelog:
  • Version 0.8.1 (2024-06-23):
    • Added if_exists parameter to provide the options to replace or append to existing file, as well as to fail if already exists.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.2.2 (2024-03-26):
    • New method.

to_html(filename: str = None, index: bool = None, encoding: str = 'utf-8', style_params: dict = None) str[source]

Returns the current SQLDataModel as a lightly formatted HTML <table> element as a string if filename is None. If filename is specified, writes the HTML to the specified file as .html and returns None.

Parameters:
  • filename (str) – The file path to save the HTML content. If None, returns the HTML as a string (default is None).

  • index (bool) – Whether to include the index column in the HTML table (default is current display_index).

  • encoding (str) – Character encoding to use when writing model to HTML file, default set to 'utf-8'.

  • style_params (dict) – A dictionary representing CSS styles {property: value} to customize the appearance of the HTML table (default is None).

Raises:
  • TypeError – If filename is not a valid string when specified or if style_params is not a dictionary when specified.

  • OSError – If encountered while trying to open and write the HTML to the file.

Returns:

If filename is None, returns the HTML content as a string. If filename is specified, writes to the file and returns None.

Return type:

str | None

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.SQLDataModel(data=[(1, 'John'), (2, 'Doe')], headers=['ID', 'Name'])

# Create and save as new html file
df.to_html('output.html', style_params={'font-size': '12pt'})

# Get HTML as a string
html_string = df.to_html()

# View output
print(html_string)

This will output:

<table>
    <tr>
        <th>ID</th>
        <th>Name</th>
    </tr>
    <tr>
        <td>1</td>
        <td>John</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Doe</td>
    </tr>
</table>
<style>
    table {font:size: 12pt;}
</style>

Note

  • Base styles are applied to reflect the styling of SQLDataModel in the terminal, including any display_color which is applied to the table CSS.

  • Table index is determined by the instance display_index attribute unless specified in the argument of the same name, overriding the instance attribute.

  • The default background-color is #E5E5E5, and the default font color is #090909, with 1 px solid border to mimic the repr for the instance.

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.9 (2024-03-19):
    • New method.

to_json(filename: str = None, index: bool = None, **kwargs) list | None[source]

Converts the SQLDataModel instance to JSON format. If filename is specified, the JSON is written to the file; otherwise, a JSON-like object is returned.

Parameters:
  • filename (str) – The path to the file where JSON will be written. If None, no file is created and JSON-like object is returned.

  • index (bool) – Whether to include the index column in the JSON. Defaults to the display_index property.

  • **kwargs – Additional keyword arguments to pass to the json.dump() method.

Raises:
  • TypeError – If filename is not of type ‘str’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

Returns:

If filename is None, a list containing a JSON-like object is returned. Otherwise JSON file created and returns None.

Return type:

list | None

Examples:

To JSON Literal
import sqldatamodel as sdm

# Sample JSON to first create model
json_source = [
    {"id": 1, "color": "red", "value": "#f00", "notes": "primary"}
    ,{"id": 2, "color": "green", "value": "#0f0", "notes": None}
    ,{"id": 3, "color": "blue", "value": "#00f", "notes": "primary"}
]

# Create the model
df = sdm.from_json(json_source)

# View current state
print(df)

This will output:

┌─────┬───────┬───────┬─────────┐
│  id │ color │ value │ notes   │
├─────┼───────┼───────┼─────────┤
│   1 │ red   │ #f00  │ primary │
│   2 │ green │ #0f0  │         │
│   3 │ blue  │ #00f  │ primary │
└─────┴───────┴───────┴─────────┘
[3 rows x 4 columns]
Write JSON File
# Write model to JSON file
df.to_json('output.json')

# Or convert to JSON-like object
json_data = df.to_json()

# View JSON object
print(json_data)

This will output:

[{
    "id": 1,
    "color": "red",
    "value": "#f00",
    "notes": "primary"
},
{
    "id": 2,
    "color": "green",
    "value": "#0f0",
    "notes": null
},
{
    "id": 3,
    "color": "blue",
    "value": "#00f",
    "notes": "primary"
}]

Note

  • When no filename is specified, JSON-like object will be returned as a rowwise array.

  • Any nested structure will be flattened by this method as well as the SQLDataModel.from_json() method.

Changelog:
  • Version 0.3.2 (2024-04-02):
    • Changed return object to JSON string literal when filename=None to convert to valid literal object.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.9 (2024-03-19):
    • New method.

to_latex(filename: str = None, index: bool = False, bold_headers: bool = False, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, horizontal_ellipses: str = '⠤⠄', index_rep: str = None, format_output_as: Literal['table', 'document'] = 'table', column_alignment: Literal['left', 'center', 'right', 'dynamic'] = None) str | None[source]

Returns the current SQLDataModel as a LaTeX table string if filename is None, otherwise writes the table to the provided file as a LaTeX document.

Parameters:
  • filename (str, optional) – The name of the file to write the LaTeX content. If not provided, the LaTeX content is returned as a string. Default is None.

  • index (bool, optional) – Whether to include the index column in the LaTeX output. Default is False.

  • bold_headers (bool, optional) – Whether the headers should be bolded in the LaTeX table. Default is False.

  • min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on attribute SQLDataModel.min_column_width.

  • max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on attribute SQLDataModel.max_column_width.

  • float_precision (int, optional) – The precision for floating-point values. Default is current value set on SQLDataModel.display_float_precision.

  • horizontal_ellipses (str, optional) – Characters to truncate column and cell values when horizontal truncation is required. Default is '⠤⠄'.

  • index_rep (str, optional) – String representation for the index. Default is None, using value set on SQLDataModel.sql_idx to represent the index column. Only used when generating index column, otherwise ignored when index = False.

  • format_output_as (Literal['table', 'document']), optional) – Whether the output should be formatted as a LaTeX table or as a standalone document. Default is ‘table’, formatting output as a modular table element.

  • column_alignment (Literal['left', 'center', 'right', 'dynamic'], optional) – The column alignment to use. Default is current value set on attribute SQLDataModel.column_alignment.

Returns:

If filename is None, returns the LaTeX formatted table as a string, if filename is provided, writes the LaTeX table to the specified file and returns None.

Return type:

str or None

Raises:
  • TypeError – If the filename argument is not of type ‘str’, index argument is not of type ‘bool’, min_column_width or max_column_width argument is not of type ‘int’.

  • ValueError – If format_output_as is not one of ‘table’, ‘document’, or column_alignment provided and is not one of ‘left’, ‘center’, ‘right’, ‘dynamic’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

LaTeX Formatting:
  • LaTeX output format that is generated can be set by format_output_as which provides one of two formats:

    • 'table': Output formatted as insertable table, beginning and ending with LaTeX \begin{table} and \end{table} respectively.

    • 'document': Output formatted as standalone document, beginning and ending with LaTeX \begin{document} and \end{document} respectively.

  • LaTeX table alignment will follow the SQLDataModel instance alignment, set by SQLDataModel.set_column_alignment():

    • 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.

    • 'left': Left-aligns all column content, equivalent to LaTeX column format: |l|.

    • 'center': Center-aligns all column content preferring left on uneven splits, equivalent to LaTeX column format: |c|.

    • 'right': Right-aligns all column content, equivalent to LaTeX column format: |r|.

  • The LaTeX rows generated will use dynamic alignment regardless of column_alignment provided, this will not affect the rendered alignment but will maintain consistent format without affecting the actual alignment rendered by LaTeX.

Examples:

Returning LaTeX Literal
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data=data, headers=headers)

# Generate LaTeX table literal
latex_output = df.to_latex()

# View LaTeX output
print(latex_output)

This will output:

\begin{tabular}{|l|r|r|}
\hline
    {Name} & {Age} & {Height} \
\hline
    John    &   30 &  175.30 \
    Alice   &   28 &  162.00 \
    Michael &   35 &  185.80 \
\hline
\end{tabular}
Write to LaTeX File
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data=data, headers=headers)

# Write the output to the file, formatting the output as a proper LaTeX document
latex_table = df.to_latex(filename='Table.tex', format_output_as='document')

Contents of file Table.tex:

\documentclass{article}
\begin{document}
\begin{table}[h]
\centering
\begin{tabular}{|l|r|r|}
\hline
    {Name} & {Age} & {Height} \
\hline
    John    &   30 &  175.30 \
    Alice   &   28 &  162.00 \
    Michael &   35 &  185.80 \
\hline
\end{tabular}
\end{table}
\end{document}

Note

  • A \centering command is included in the LaTeX output by default regardless of alignments specified.

  • LaTeX headers and rows are indented by four spaces to keep with conventional table syntax and to distinguish the table data from commands.

  • Table commands and headers are checked for invalid LaTeX characters and escaped such as '_' and '#', however the model data is not.

    Accordingly, ensure any model content is valid LaTeX when rendering to PDF, or simply format content as valid LaTeX before exporting.

Changelog:
  • Version 0.11.0 (2024-07-05):
    • Added float_precision parameter to align with similar format specific methods and provide additional formatting options.

    • Added horizontal_ellipses parameter to allow customizing truncation characters used when column or cell values exceed maximumn column widths.

    • Added index_rep parameter to allow customizing index column name with prior behavior set as default representation. Ignored when index = False.

    • Modified to use SQLDataModel.to_string() instead of generating independently formatted repr for more consistency between tabular outputs.

    • Modified to check and escape any invalid LaTeX characters or symbols when generating headers.

  • Version 0.10.4 (2024-07-03):
    • Modified to escape newline characters through utils.sqlite_printf_format() to avoid wrapping table rows.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.9 (2024-03-19):
    • New method.

to_list(index: bool = False, include_headers: bool = False) list[source]

Returns the current SQLDataModel data as a 1-dimensional list of values if data dimensions are compatible with flattening, or as a list of lists if data is 2-dimensional. Data is returned without index or headers by default, use index = True or include_headers = True to modify.

Parameters:
  • index (bool, optional) – If True, includes the index in the result, if False, excludes the index. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result, if False, excludes headers. Default is False.

Returns:

The flattened list of values corresponding to the model data.

Return type:

list

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('Beth', 27, 172.4),
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Get all model data as a list of lists
model_data = df.to_list()

# Iterate over each row
for row in model_data:
    print(row)

This will output:

['Beth', 27, 172.4]
['John', 30, 175.3]
['Alice', 28, 162.0]
['Travis', 35, 185.8]

Data will be flattened into a single dimension if possible, such as when accessing individual columns:

# Get 'Name' column as a list
col_data = df['Name'].to_list()

# View output
print(col_data)

This will output a list containing the values from each row for the column:

['Beth', 'John', 'Alice', 'Travis']

Data will also be flattened when accessing individual rows:

# Get first row as a list with index
row_data = df[0].to_list(index=True)

# View result
print(row_data)

This will output the row’s values including the index:

[0, 'Beth', 27, 172.4]

Note

Changelog:
  • Version 0.5.0 (2024-05-09):
    • Modified behavior to output 1-dimensional list when possible and a list of lists when not possible.

    • Changed default to index = False to increase surface for 1-dimensional flattening.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.5 (2023-11-24):
    • New method.

to_local_db(filename: str) None[source]

Writes the SQLDataModel in-memory database to disk as a SQLite database file using the specified filename.

Parameters:

filename (str) – The filename or filepath to use when writing the model to disk.

Raises:
  • TypeError – If filename is provided and is not of type ‘str’ representing a valid sqlite database save path.

  • sqlite3.Error – If there is an issue with the SQLite database operations during backup.

Returns:

None

Example:

import sqlite3
import sqldatamodel as sdm

# Sample data
data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')]

# Create the model
df = sdm.SQLDataModel(data, headers=['Name','Age','Sex'])

# Filename to use for database
db_file = "model.db"

# Write the in-memory database model to disk
df.to_local_db(db_file)

# Loading the model back from disk can now be done at anytime
df = sdm.from_sql("sdm", sqlite3.connect(db_file))

# View restored model
print(df)

This will output the model we originally created:

┌───┬───────┬─────┬─────┐
│   │ Name  │ Age │ Sex │
├───┼───────┼─────┼─────┤
│ 0 │ Alice │  20 │ F   │
│ 1 │ Billy │  25 │ M   │
│ 2 │ Chris │  30 │ M   │
└───┴───────┴─────┴─────┘
[3 rows x 3 columns]

Note

Changelog:
  • Version 0.5.2 (2024-05-13):
    • Renamed db parameter to filename for package consistency and to avoid confusion between similarily named database objects.

    • Changed filename from keyword to positional argument making it a required parameter to avoid accidental overwriting.

  • Version 0.1.5 (2023-11-24):
    • New method.

to_markdown(filename: str = None, index: bool = False, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, horizontal_ellipses: str = '⠤⠄', index_rep: str = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None) str | None[source]

Returns the current SQLDataModel as a markdown table literal if filename is None, otherwise writes the table to the provided file as markdown.

Parameters:
  • filename (str, optional) – The name of the file to write the Markdown content. If not provided, the Markdown content is returned as a string. Default is None.

  • index (bool, optional) – Whether to include the index column in the Markdown output. Default is False.

  • min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on SQLDataModel.min_column_width.

  • max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on SQLDataModel.max_column_width.

  • float_precision (int, optional) – The precision for floating-point values. Default is current value set on SQLDataModel.display_float_precision.

  • horizontal_ellipses (str, optional) – Characters to truncate column and cell values when horizontal truncation is required. Default is '⠤⠄'.

  • index_rep (str, optional) – String representation for the index. Default is None, using value set on SQLDataModel.sql_idx to represent the index column. Only used when generating index column, otherwise ignored when index = False.

  • column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – The alignment for table columns. Default is current value set on SQLDataModel.column_alignment. 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types. 'left': Left-aligns all column content. 'center': Center-aligns all column content preferring left on uneven splits. 'right': Right-aligns all column content.

Raises:
  • TypeError – If the filename argument is not of type ‘str’, if float_precision, min_column_width or max_column_width arguments are not type ‘int’.

  • ValueError – If the column_alignment argument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

Returns:

If filename is None, returns the Markdown table as a string, if filename is provided, writes the Markdown table to the specified file and returns None.

Return type:

str or None

Column Alignment:
  • 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.

  • 'left': Left-aligns all column content.

  • 'center': Center-aligns all column content preferring left on uneven splits.

  • 'right': Right-aligns all column content.

Examples:

To Markdown Literal
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data=data, headers=headers)

# Generate markdown table literal
markdown_table = df.to_markdown()

# View markdown output
print(markdown_table)

This will output:

| Name    |  Age |  Height |
|:--------|-----:|--------:|
| John    |   30 |  175.30 |
| Alice   |   28 |  162.00 |
| Michael |   35 |  185.80 |
Write to Markdown File
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data=data, headers=headers)

# Write the output to the file, center-aligning all columns
df.to_markdown(filename='Table.MD', column_alignment='center')

Contents of Table.MD:

| Name    |  Age |  Height |
|:--------|-----:|--------:|
| John    |   30 |  175.30 |
| Alice   |   28 |  162.00 |
| Michael |   35 |  185.80 |

Note

  • All markdown output will contain the alignment characters ':' as determined by the SQLDataModel.column_alignment attribute or parameter.

  • Any exception encountered during file read or writing operations is caught and reraised, see related SQLDataModel.from_markdown().

  • Use index_rep to provide a different representation, column name, for the index column if included in output.

  • Unlike other representations, no rowwise or vertical truncation is performed on output content.

Changelog:
  • Version 0.11.0 (2024-07-05):
    • Added horizontal_ellipses parameter to allow customizing truncation characters used when column or cell values exceed maximumn column widths.

    • Added index_rep parameter to allow customizing index column name with prior behavior set as default representation. Ignored when index = False.

    • Modified to use SQLDataModel.to_string() instead of generating independently formatted repr for more consistency between tabular outputs.

  • Version 0.10.4 (2024-07-03):
    • Modified to escape newline characters through utils.sqlite_printf_format() to avoid wrapping table rows.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.9 (2024-03-19):
    • New method.

to_numpy(index: bool = False, include_headers: bool = False) _np.ndarray[source]

Converts SQLDataModel to a NumPy ndarray object of shape (rows, columns). Note that the numpy package must be installed to use this method.

Parameters:
  • index (bool, optional) – If True, includes the model index in the result. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result. Default is False.

Raises:

ModuleNotFoundError – If NumPy is not installed.

Returns:

The model’s data converted into a NumPy array.

Return type:

numpy.ndarray

Example:

import numpy
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the sample model
df = sdm.SQLDataModel(data, headers)

# Create the numpy array with default parameters, no indicies or headers
result_array = df.to_numpy()

# View array
print(result_array)

This will output:

[['John' '30' '175.3']
 ['Alice' '28' '162.0']
 ['Travis' '35' '185.8']]

Model headers can also be retained:

# Create the numpy array with with indicies and headers
result_array = df.to_numpy(index=True, include_headers=True)

# View array
print(result_array)

This will output:

[['idx' 'Name' 'Age' 'Height']
 ['0' 'John' '30' '175.3']
 ['1' 'Alice' '28' '162.0']
 ['2' 'Travis' '35' '185.8']]

Note

  • Output will always be a 2-dimensional array of type numpy.ndarray

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.3 (2023-10-15):
    • New method.

to_pandas(index: bool = False, include_headers: bool = True) _pd.DataFrame[source]

Converts SQLDataModel to a Pandas DataFrame object. Note that the pandas package must be installed to use this method.

Parameters:
  • index (bool, optional) – If True, includes the model index in the result. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result. Default is True.

Raises:

ModuleNotFoundError – If Pandas is not installed.

Returns:

The model’s data converted to a Pandas DataFrame.

Return type:

pandas.DataFrame

Example:

import pandas
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df_sdm = sdm.SQLDataModel(data, headers)

# Convert the model to a pandas df
df_pd = df_sdm.to_pandas(include_headers=True, index=True)

# View result
print(df_pd)

This will output:

    Name  Age  Height
0    John   30   175.3
1   Alice   28   162.0
2  Travis   35   185.8

Note

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.1.3 (2023-10-15):
    • New method.

to_parquet(filename: str, index: bool = True, **kwargs) None[source]

Writes the current SQLDataModel to the specified parquet filename.

Parameters:
  • filename (str) – The file path to save the parquet file, e.g., filename = 'user/data/output.parquet'.

  • index (bool, optional) – Whether or not the SQLDataModel index should be included in the export. Default is True.

  • **kwargs – Additional keyword arguments to pass to the pyarrow write_table function.

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed as determined by optionals._has_pa flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid parquet file path.

  • Exception – If any unexpected exception occurs during the parquet writing process.

Returns:

If successful, a new parquet file filename is created and None is returned.

Return type:

None

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Rate']
data = [('Alice', 25, 26.50), ('Bob', 30, 21.25), ('Will', 35, 24.00)]

# Create the model
df = sdm.SQLDataModel(data,headers, display_index=False)

# Parquet file
pq_file = "output.parquet"

# Write the model as parquet file
df.to_parquet(pq_file)

# Confirm result by reading back file
df_result = sdm.from_parquet(pq_file)

# View model
print(df_result)

This will output:

┌───────┬──────┬────────┐
│ Name  │  Age │   Rate │
├───────┼──────┼────────┤
│ Alice │   25 │  26.50 │
│ Bob   │   30 │  21.25 │
│ Will  │   35 │  24.00 │
└───────┴──────┴────────┘
[3 rows x 3 columns]

Note

  • The pyarrow package is required to use this method as well as the SQLDataModel.from_parquet() method.

  • The SQLDataModel.to_dict() method is used prior to writing to parquet to convert the SQLDataModel into a dictionary suitable for parquet Table format.

  • Exceptions raised by the pyarrow package and its methods are caught and reraised when encountered to keep with package error formatting.

Changelog:
  • Version 0.8.2 (2024-06-24):
    • Added index parameter to toggle inclusion of SQLDataModel index column for greater flexibility and package consistency to similar methods.

  • Version 0.1.9 (2024-03-19):
    • New method.

to_pickle(filename: str = None) None[source]

Save the SQLDataModel instance to the specified filename as a pickle object.

Parameters:

filename (str, optional) – The file name to save the model to. If None, the invoking Python file’s name with a “.sdm” extension will be used.

Raises:

TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.

Returns:

None

Example:

import sqldatamodel as sdm

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the SQLDataModel object
df = sdm.SQLDataModel(data, headers)

# Save the model's data as a pickle file "output.sdm"
df.to_pickle("output.sdm")

# Alternatively, leave blank to use the current file's name:
df.to_pickle()

# This way the same data can be recreated later by calling the from_pickle() method from the same project:
df = sdm.from_pickle()

Note

  • All data, headers, data types and display properties will be saved when pickling.

  • If no filename argument is provided, then the invoking module’s __name__ property will be used by default.

to_polars(index: bool = False, include_headers: bool = True) _pl.DataFrame[source]

Converts SQLDataModel to a Polars DataFrame object. Note that the polars package must be installed to use this method.

Parameters:
  • index (bool, optional) – If True, includes the model index in the result. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result. Default is True.

Raises:

ModuleNotFoundError – If Polars is not installed.

Returns:

The model’s data converted to a Polars DataFrame.

Return type:

polars.DataFrame

Example:

import polars
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('Beth', 27, 172.4),
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
df_sdm = sdm.SQLDataModel(data, headers)

# Convert the model to a polars df with the index
df_pl = df_sdm.to_polars(index=True)

# View result
print(df_pl)

This will output:

shape: (4, 4)
┌─────┬────────┬─────┬────────┐
│ idx ┆ Name   ┆ Age ┆ Height │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ i64 ┆ str    ┆ i64 ┆ f64    │
╞═════╪════════╪═════╪════════╡
│ 0   ┆ Beth   ┆ 27  ┆ 172.4  │
│ 1   ┆ John   ┆ 30  ┆ 175.3  │
│ 2   ┆ Alice  ┆ 28  ┆ 162.0  │
│ 3   ┆ Travis ┆ 35  ┆ 185.8  │
└─────┴────────┴─────┴────────┘

Note

  • See related SQLDataModel.from_polars() for the inverse method of converting a Polars DataFrame object into to a SQLDataModel.

  • SQLDataModel uses different data types than those used in polars, see SQLDataModel.set_column_dtypes() for more information about casting rules.

  • Polars does not really have a concept of an index column, therefore when using index=True, the SQLDataModel index is just an additional column in the returned DataFrame object.

Changelog:
  • Version 1.1.0 (2024-10-22):
    • Added orient = 'row' argument to explicitly set data orientation when constructing dataframe.

  • Version 0.3.8 (2024-04-12):
    • New method.

to_pyarrow(index: bool = False) _pa.Table[source]

Returns the current SQLDataModel in Apache Arrow columnar format as a pyarrow.Table.

Parameters:

index (bool, optional) – Specifies whether to include the index of the SQLDataModel in the resulting Table. Default is to False.

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed.

  • Exception – If any unexpected exception occurs during the pyarrow conversion process.

Returns:

A table representing the current SQLDataModel in Apache Arrow columnar format.

Return type:

pyarrow.Table

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Create the pyarrow table
table = df.to_pyarrow()

# View result
print(table)

This will output the pyarrow object details:

pyarrow.Table
Name: string
Age: int64
Grade: double
----
Name: [["Alice","Bob","Charlie"]]
Age: [[25,30,35]]
Grade: [[3.8,3.9,3.2]]

Note

  • Unmodified python types will follow conversion and casting rules specified in pyarrow implementation, for the modified date and datetime types, date32[day] and timestamp[us] will be used, respectively.

Changelog:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

  • Version 0.2.3 (2024-03-28):
    • New method.

to_sql(table: str, con: Connection | Any, *, schema: str = None, if_exists: Literal['fail', 'replace', 'append'] = 'fail', index: bool = True, primary_key: str | int = None) None[source]

Insert the SQLDataModel into the specified table using the provided database connection.

Supported Connection APIs:
  • SQLite using sqlite3 or url with format 'file:///path/to/database.db'

  • PostgreSQL using psycopg2 or url with format 'postgresql://user:pass@hostname:port/db'

  • SQL Server ODBC using pyodbc or url with format 'mssql://user:pass@hostname:port/db'

  • Oracle using cx_Oracle or url with format 'oracle://user:pass@hostname:port/db'

  • Teradata using teradatasql or url with format 'teradata://user:pass@hostname:port/db'

Parameters:
  • table (str) – The name of the table where data will be inserted.

  • con (sqlite3.Connection | Any) – The database connection object or connection url. Supported connection APIs are sqlite3, psycopg2, pyodbc, cx_Oracle, teradatasql

  • schema (str, optional) – The schema to use for PostgreSQL and ODBC SQL Server connections, ignored otherwise. Default is None.

  • if_exists (Literal['fail', 'replace', 'append'], optional) – Action to take if the table already exists. If fail an error is raised if table exists and no inserts occur. If replace any existing table is dropped prior to inserts. If append existing table is appended to by subsequent inserts.

  • index (bool, optional) – If the model index should be included in the target table. Default is True.

  • primary_key (str | int, optional) – Column name or index to use as table primary key. Default is None, using the index column as the primary key when index=True.

Raises:
  • SQLProgrammingError – If an error occurs during cursor accessing, table creation or data insertion into the database.

  • ModuleNotFoundError – If con is provided as a connection url and the specified scheme driver module is not found.

  • ValueError – If specified table already exists when using if_exists='fail' or if con is not one of the currently supported connection modules.

  • IndexError – If primary_key is provided as an int representing a column index but is out of range of the current model SQLDataModel.column_count.

  • TypeError – If primary_key argument provided is not of type ‘str’ or ‘int’ representing a valid column name or index to use as the primary key column for the target table.

Returns:

None

Example:

import sqlite3
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Create connection object
sqlite_db_conn = sqlite3.connect('students.db')

# Basic usage, creating a new table
df.to_sql('users', sqlite_db_conn)

This will create a new table users, or fail if one already exists:

sqlite> select * from users;

idx  Name     Age  Grade
---  -------  ---  -----
0    Alice    25   3.8
1    Bob      30   3.9
2    Charlie  35   3.2
3    David    28   3.4

Connect to PostgreSQL, SQL Server, Oracle or Teradata:

import psycopg2
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Setup the connection, whether using psycopg2 or other supported modules like pyodbc
con = psycopg2.connect(...)

# Create or replace existing table in database
df.to_sql('users', con, if_exists='replace', index=False)

This will result in a new table users in our PostgreSQL database:

=> select * from users;

Name    | Age | Grade |
--------+-----+-------+
Alice   |  25 |   3.8 |
Bob     |  30 |   3.9 |
Charlie |  35 |   3.2 |
David   |  28 |   3.4 |

For SQL Server connections using pyodbc, the example would be almost identical except for which con object we use:

import pyodbc

# For SQL Server ODBC connections using pyodbc
con = pyodbc.connect(...)

The same is true for Oracle and other connections:

import cx_Oracle

# For Oracle connections using cx_Oracle
con = cx_Oracle.connect(...)

Using a Primary Key

import sqldatamodel as sdm

# Sample data
headers = ['ID', 'User']
data = [(1001, 'Alice'), (1002, 'Bob'), (1003, 'Charlie'), (1004, 'David')]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Create connection object
sqlite_db_conn = sqlite3.connect('students.db')

# Create the table using the 'ID' column as the primary key
df.to_sql('users', sqlite_db_conn, if_exists='replace', index=False, primary_key='ID')

This will create a users table with the schema:

sqlite> .schema users

CREATE TABLE "users" ( "ID" INTEGER PRIMARY KEY,  "User" TEXT);

With the ID column as its primary key:

sqlite> select * from users;

ID    User
----  -------
1001  Alice
1002  Bob
1003  Charlie
1004  David

If table creation is necessary, column types will be mapped according to the destination database by the following conversion:

┌─────────────────┬─────────┬─────────┬────────┬─────────┬────────────────┬──────┬───────────┐
│ Database \ Type │ NULL    │ INTEGER │ REAL   │ TEXT    │ BLOB           │ DATE │ TIMESTAMP │
├─────────────────┼─────────┼─────────┼────────┼─────────┼────────────────┼──────┼───────────┤
│ PostgreSQL      │ UNKNOWN │ INTEGER │ FLOAT  │ TEXT    │ BYTEA          │ DATE │ TIMESTAMP │
│ SQL Server ODBC │ UNKNOWN │ INTEGER │ FLOAT  │ TEXT    │ VARBINARY(MAX) │ DATE │ DATETIME  │
│ Oracle          │ UNKNOWN │ NUMBER  │ NUMBER │ VARCHAR │ BLOB           │ DATE │ DATETIME  │
│ Teradata        │ UNKNOWN │ INTEGER │ FLOAT  │ VARCHAR │ BYTE           │ DATE │ DATETIME  │
│ SQLite          │ NULL    │ INTEGER │ REAL   │ TEXT    │ BLOB           │ DATE │ TIMESTAMP │
└─────────────────┴─────────┴─────────┴────────┴─────────┴────────────────┴──────┴───────────┘
[5 rows x 8 columns]

Note

  • When providing a primary_key column it will be assumed unique and the model will not perform any unique-ness constraints.

  • When con is provided as a string a connection will be attempted using utils._create_connection() if the path does not exist, otherwise a sqlite3 local connection will be attempted.

  • When con is provided as an object a connection is assumed to be open and valid, if a cursor cannot be created from the object an exception will be raised.

  • Connections with write access can be used in the SQLDataModel.to_sql() method for writing to the same connection types, be careful.

  • ValueError will be raised if table already exists, use if_exists = 'replace' or if_exists = 'append' to instead replace or append to the table.

  • See relevant module documentation for additional details or information pertaining to specific database or connection dialect being used.

  • See related SQLDataModel.from_sql() for creating SQLDataModel from existing SQL database connections.

  • See utility methods utils._parse_connection_url() and utils._create_connection() for implementation on creating database connections from urls.

Changelog:
  • Version 0.9.1 (2024-06-27):
    • Modified handling of con parameter to allow database connection url to also be provided as 'scheme://user:pass@host:port/db'

  • Version 0.8.2 (2024-06-24):
    • Modified handling of con parameter to allow providing SQLite database filepath directly as string to instantiate connection.

  • Version 0.3.0 (2024-03-31):
    • Renamed arguments extern_con: con, replace_existing: if_exists, include_index: index.

    • Added primary_key argument for specifying a primary key column for table schema.

    • Added schema argument for specifying a target schema for the table.

to_string(index: bool = None, display_max_rows: int = None, display_max_width: int = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, vertical_ellipses: str = '⠒⠂', horizontal_ellipses: str = '⠤⠄', display_dimensions: bool = False, index_rep: str = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = None) str[source]

Generate a tabular representation of the model based on custom parameters and bounds.

Parameters:
  • index (bool, optional) – Whether to include the index column in the output. Default is None, using SQLDataModel.display_index value.

  • display_max_rows (int, optional) – Maximum number of rows to display. Default is None, using SQLDataModel.display_max_rows value.

  • display_max_width (int, optional) – Maximum character width of the output table before horizontal truncation occurs. Default is None, generating a full width representation.

  • min_column_width (int, optional) – Minimum width of columns. Default is None, using SQLDataModel.min_column_width value.

  • max_column_width (int, optional) – Maximum width of columns. Default is None, using SQLDataModel.max_column_width value with a floor value of 2.

  • float_precision (int, optional) – Precision for displaying float values. Default is None, using SQLDataModel.display_float_precision value.

  • vertical_ellipses (str, optional) – Characters to represent row truncation when vertical truncation is required. Default is '⠒⠂'.

  • horizontal_ellipses (str, optional) – Characters to represent column truncation when horizontal truncation is required. Default is '⠤⠄'.

  • display_dimensions (bool, optional) – Whether to display the dimensions of the table. Defaults to False.

  • index_rep (str, optional) – String representation for the index. Default is None, using a single whitespace character to represent the index column. Only used when generating index column, otherwise ignored when index = False.

  • column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – Alignment for columns. Default is None, using SQLDataModel.column_alignment value.

  • table_style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple'], optional) – Table style. Default is None, using SQLDataModel.table_style value.

Returns:

A string representing the tabular output of the model with the restrictions and styling applied.

Return type:

str

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Gender', 'City']
data = [
    ('Alice', 38, 'Female', 'Milwaukee'),
    ('Sarah', None, 'Female', 'Houston'),
    ('Michael', 42, 'Male', 'Atlanta'),
    ('John', None, 'Male', 'Boston'),
    ('Bobby', 25, 'Male', 'Chicago'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Generate a Markdown style representation using 'ID' to represent the index
markdown_repr = df.to_string(table_style='markdown', index_rep='ID')

This will generate a ‘Markdown’ styled representation:

| ID | Name    | Age | Gender | City      |
|----|---------|-----|--------|-----------|
|  0 | Alice   |  38 | Female | Milwaukee |
|  1 | Sarah   |     | Female | Houston   |
|  2 | Michael |  42 | Male   | Atlanta   |
|  3 | John    |     | Male   | Boston    |
|  4 | Bobby   |  25 | Male   | Chicago   |

Vertical and horizontal limits can also be applied:

# Set vertical and horizontal limits with custom styling
truncated_repr = df.to_string(
    table_style='polars',
    display_max_rows=4,
    display_max_width=36,
    horizontal_ellipses='..',
    vertical_ellipses='...',
    index=False
)

# View output
print(truncated_repr)

This will output a vertically and horizontally truncated representation that fits within the bounds provided:

┌───────┬─────┬────┬───────────┐
│ Name  ┆ Age ┆ .. ┆ City      │
╞═══════╪═════╪════╪═══════════╡
│ Alice ┆  38 ┆ .. ┆ Milwaukee │
│ Sarah ┆     ┆ .. ┆ Houston   │
│  ...  ┆ ... ┆ .. ┆    ...    │
│ John  ┆     ┆ .. ┆ Boston    │
│ Bobby ┆  25 ┆ .. ┆ Chicago   │
└───────┴─────┴────┴───────────┘

Note

  • Table styles reflect style similarity only, format specifc methods should be used for generating complete and valid output.

  • Vertical truncation characters are applied to column wide truncation and horizontal truncation characters are applied at row and cell level.

  • When a discrepancy exists between minimum and maximum column widths, conflict is resolved by setting max width equal to max(min_column_width, max_column_width).

  • See SQLDataModel.to_text() for writing textual representation directly to ‘.txt’ files.

  • See SQLDataModel.set_table_style() for available style options and output examples.

Changelog:
  • Version 2.3.0 (2026-01-21):
    • Fixed issue where providing float_precision had no actual impact on dispaly float precision used in output.

  • Version 0.11.0 (2024-07-05):
    • New method.

to_text(filename: str = None, index: bool = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, horizontal_ellipses: str = '⠤⠄', index_rep: str = ' ', display_dimensions: bool = False, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = None) str | None[source]

Returns a textual representation of the current SQLDataModel as a string literal or by writing to file if a filename is provided.

Parameters:
  • filename (str, optional) – The name of the file to write the text content. If provided, writes the text to the specified file. Default is None.

  • index (bool, optional) – Whether to include the index column in the text output. Default is value set on SQLDataModel.display_index.

  • min_column_width (int, optional) – The minimum column width for table cells. Default is value set on SQLDataModel.min_column_width.

  • max_column_width (int, optional) – The maximum column width for table cells. Default is value set on SQLDataModel.max_column_width.

  • float_precision (int, optional) – The precision for floating-point values. Default is value set on SQLDataModel.display_float_precision.

  • horizontal_ellipses (str, optional) – Characters to truncate column and cell values when horizontal truncation is required. Default is '⠤⠄'.

  • index_rep (str, optional) – String representation for the index. Default is None, using a single whitespace character to represent the index column. Only used when generating index column, otherwise ignored when index = False.

  • display_dimensions (bool, optional) – Whether to include the model dimensions [N rows x N cols] in the text output. Default is False.

  • column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – Column alignment. Default is value at SQLDataModel.column_alignment. 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types. 'left': Left-aligns all column content. 'center': Center-aligns all column content preferring left on uneven splits. 'right': Right-aligns all column content.

  • table_style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple'], optional) – The table styling to use. Default is value set on SQLDataModel.table_style.

Raises:
  • TypeError – If arguments are provided but are not the correct types: filename (str), index (bool), min_column_width (int), max_column_width (int), float_precision (int).

  • ValueError – If the column_alignment argument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

Returns:

If filename is None, returns the textual representation as a string. If filename is provided, writes the textual representation to the specified file and returns None.

Return type:

str or None

Examples:

Returning Text Literal
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data=data, headers=headers)

# Generate text table literal
text_table = df.to_text()

# View output
print(text_table)

This will output:

┌─────────┬──────┬────────┐
│ Name    │  Age │ Height │
├─────────┼──────┼────────┤
│ John    │   30 │  175.3 │
│ Alice   │   28 │  162.0 │
│ Michael │   35 │  185.8 │
└─────────┴──────┴────────┘
Write to File
import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
df = sdm.SQLDataModel(data=data, headers=headers)

# Write the output to the file, center-aligning all columns
df.to_text(filename='Table.txt', column_alignment='center')

Contents of Table.txt:

┌───┬─────────┬──────┬────────┐
│   │  Name   │ Age  │ Height │
├───┼─────────┼──────┼────────┤
│ 0 │  John   │  30  │ 175.3  │
│ 1 │  Alice  │  28  │ 162.0  │
│ 2 │ Michael │  35  │ 185.8  │
└───┴─────────┴──────┴────────┘

Important

Unlike output from print(df) or other calls to SQLDataModel.__repr__(), the output from this method includes the full SQLDataModel and is not restricted by current terminal size or the value set at SQLDataModel.display_max_rows. As such, horizontal truncation only occurs on cell values as determined by max_column_width and no other horizontal or vertical table-wide truncation is performed.

Note

Changelog:
  • Version 0.11.0 (2024-07-05):
    • Added horizontal_ellipses parameter to allow customizing truncation characters used when column or cell values exceed maximumn column widths.

    • Added index_rep parameter to allow customizing index column name with prior behavior set as default representation. Ignored when index = False.

    • Modified to use SQLDataModel.to_string() instead of generating independently formatted repr for more consistency between tabular outputs.

  • Version 0.10.4 (2024-07-03):
    • Modified to escape newline characters through utils.sqlite_printf_format() to avoid wrapping table rows.

  • Version 0.9.3 (2024-06-28):
    • Added additional options ‘rst-simple’ and ‘rst-grid’ for table_style parameter.

  • Version 0.3.10 (2024-04-16):
    • Added table_style parameter and updated output to reflect new formatting styles introduced in version 0.3.9.

    • Added display_dimensions parameter to allow toggling display of table dimensions in output.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

to_xml(filename: str | None = None, root_tag: str = 'data', row_tag: str = 'row', column_tag: str = 'column', value_tag: str = 'value', orient: Literal['rows', 'columns'] = 'rows', index: bool | None = None, encoding: str = 'utf-8', pretty: bool = True, xml_declaration: bool = True) str | None[source]

Converts the SQLDataModel instance to XML format. If filename is specified, writes the XML to file; otherwise returns the XML string literal.

Parameters:
  • filename (str | None) – Output file path. If None, returns XML as string.

  • root_tag (str, optional) – Root element name. Default is ‘data’.

  • row_tag (str, optional) – Row element name. Default is ‘row’.

  • column_tag (str, optional) – column element name. Default is ‘column’.

  • value_tag (str, optional) – value element name. Default is ‘value’.

  • orient (Literal['rows','columns'], optional) – Orientation of the XML output. - 'rows' (default): Each row is serialized as a <row> element. - 'columns': Each column is serialized as a <column> element containing one or more <value> elements.

  • index (bool | None) – Whether to include index column. Defaults to display_index.

  • encoding (str, optional) – Output encoding. Default ‘utf-8’.

  • pretty (bool, optional) – Whether to pretty-print XML output. Default True.

  • xml_declaration (bool, optional) – Whether to include the XML declaration <?xml version="1.0" encoding="utf-8"?> at the top of the output. Default is True.

Raises:
  • TypeError – If filename is provided and is not of type str representing a filepath to write the XML data to.

  • OSError – If an error is encountered when trying to access or write to the specified file.

Returns:

If filename is None, returns the XML representation as a string. If filename is provided, writes the XML representation to the specified file and returns None.

Return type:

str

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)]

# Create the model and generate the XML data
df = sdm.SQLDataModel(data, headers)
xml_data = df.to_xml(index=False)

# View the resulting XML literal
print(xml_data)

This will output the XML representation of our sample data:

<data>
    <row>
        <Name>Alice</Name>
        <Age>25</Age>
        <Grade>3.8</Grade>
    </row>
    <row>
        <Name>Bob</Name>
        <Age>30</Age>
        <Grade>3.9</Grade>
    </row>
    <row>
        <Name>Charlie</Name>
        <Age>35</Age>
        <Grade>3.2</Grade>
    </row>
</data>

`

Orient by columns:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Alternatively, we can orient the XML data by columns instead of rows
xml_by_cols = df.to_xml(orient='columns', index=False)

# View the resulting XML literal
print(xml_data)

This will output the data in columnar orientation:

<data>
    <column name="Name">
        <value>Alice</value>
        <value>Bob</value>
        <value>Charlie</value>
    </column>
    <column name="Age">
        <value>25</value>
        <value>30</value>
        <value>35</value>
    </column>
    <column name="Grade">
        <value>3.8</value>
        <value>3.9</value>
        <value>3.2</value>
    </column>
</data>

Note

  • Columns with names that are not valid XML tags are serialized using a <col> element with the original name stored in a name attribute for round-trip safety.

  • The XML declaration can be excluded by setting xml_declaration=False, which is useful when embedding the output as an XML fragment inside a larger document.

  • When orient='columns' is used, the output is fully compatible with SQLDataModel.from_xml(orient='columns')() for lossless round-trip conversion.

Changelog:
  • Version 2.3.1 (2026-01-22):
    • New method.

transpose(infer_types: bool = True, include_headers: bool = False) SQLDataModel[source]

Transposes the model and returns as a new SQLDataModel.

Parameters:
  • infer_types (bool, optional) – If types should be inferred after the transposition. Defaults to True.

  • include_headers (bool, optional) – If headers are included in the transposed data. Defaults to False.

Returns:

The transposition of the model as a new SQLDataModel instance.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Create the model
df = sdm.SQLDataModel([('A1', 'A2'), ('B1', 'B2'), ('C1', 'C2')])

# Transpose it
df_transposed = df.transpose()

# View original
print(f"Original:\n{df}")

# Along with transposed
print(f"Transposed:\n{df_transposed}")

This will output the result of the transposition:

Original:
┌───┬─────┬─────┐
│   │ 0   │ 1   │
├───┼─────┼─────┤
│ 0 │ A1  │ A2  │
│ 1 │ B1  │ B2  │
│ 2 │ C1  │ C2  │
└───┴─────┴─────┘
[3 rows x 2 columns]

Transposed:
┌───┬─────┬─────┬─────┐
│   │ 0   │ 1   │ 2   │
├───┼─────┼─────┼─────┤
│ 0 │ A1  │ B1  │ C1  │
│ 1 │ A2  │ B2  │ C2  │
└───┴─────┴─────┴─────┘
[2 rows x 3 columns]

Note

  • When infer_types=False, the first row of the transposed result will be used to set the dtypes of the new model. This is generally a poor choice considering the nature of transposing data.

  • If include_headers=True, the headers will be included as the first row in the transposed data.

  • Running this method sequentially should return the original model, sdm == sdm.transpose().transpose()

Changelog:
  • Version 0.3.5 (2024-04-08):
    • New method.

unique(ignore_index: bool = True) SQLDataModel[source]

Returns a new model using the unique values of the current model, keeping the first by order of appearance.

Parameters:

ignore_index (bool, optional) – If True, the original index of the unique values is ignored. If False, the original index is kept. Default is True.

Returns:

A new model consisting of the unique values contained in the original model.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
data = [
    ('Bob', 'Chicago'),
    ('Bob', 'Chicago'),
    ('Bob', 'Chicago'),
    ('Alice', 'New York'),
    ('Alice', 'New York'),
    ('Charles', 'Boston')
]

# Create the model
df = sdm.SQLDataModel(data, headers=['Name', 'City'])

# Create a new model from only unique rows
df_unique = df.unique()

# View it
print(df_unique)

This will output the first unique rows, ignoring the original indicies:

┌───┬─────────┬──────────┐
│   │ Name    │ City     │
├───┼─────────┼──────────┤
│ 0 │ Bob     │ Chicago  │
│ 1 │ Alice   │ New York │
│ 2 │ Charles │ Boston   │
└───┴─────────┴──────────┘
[3 rows x 2 columns]

Alternatively, the original index for each unique row can be retained

# Do not ignore the indicies
df_unique_with_idx = df.unique(ignore_index=False)

# View it
print(df_unique_with_idx)

This will output a similar result, but the original indicies from the rows kept is retained:

┌───┬─────────┬──────────┐
│   │ Name    │ City     │
├───┼─────────┼──────────┤
│ 0 │ Bob     │ Chicago  │
│ 3 │ Alice   │ New York │
│ 5 │ Charles │ Boston   │
└───┴─────────┴──────────┘
[3 rows x 2 columns]

This method is particularly useful when needing to extract subsets of data, for example:

# Sample data
headers = ['Name', 'Age', 'Department']
data = [
    ('Alice', 38, 'HR'),
    ('Carol', 37, 'HR'),
    ('Billy', 23, 'Marketing'),
    ('Nate',  28, 'Sales'),
    ('Jill',  27, 'Sales'),
    ('John',  31, 'Engineering'),
    ('Kyle',  32, 'Engineering'),
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter rows by 'Age' and return unique 'Department' values
under_30_depts = df[df['Age'] < 30, 'Department'].unique()

# View it
print(under_30_depts)

This will output the unique ‘Department’ values for those rows matching the ‘Age’ filter:

┌───┬────────────┐
│   │ Department │
├───┼────────────┤
│ 0 │ Marketing  │
│ 1 │ Sales      │
└───┴────────────┘
[2 rows x 1 columns]

Note

Changelog:
  • Version 1.3.0 (2025-02-09):
    • New method.

update_index_at(row_index: int, column_index: int | str, value: Any = None) None[source]

Updates a specific cell in the SQLDataModel at the given row and column indices with the provided value.

Parameters:
  • row_index (int) – The index of the row to be updated.

  • column_index (int or str) – The index or name of the column to be updated.

  • value (Any, optional) – The new value to be assigned to the specified cell.

Raises:
  • TypeError – If row_index is not of type ‘int’ or if column_index is not of type ‘int’ or ‘str’.

  • IndexError – If row or column provided as an ‘int’ but is outside of the current model row or column range.

  • ValueError – If column provided as a ‘str’ but is not found in the current model headers.

  • SQLProgrammingError – If there is an issue with the SQL execution during the update.

Returns:

None

Example:

import sqldatamodel as sdm

# Create an initial 3x3 model filled with dashes
df = sdm.from_shape((3,3), fill='---', headers=['A', 'B', 'C'])

# Update cell based on integer indicies
df.update_index_at(0, 0, 'Top Left')
df.update_index_at(0, 2, 'Top Right')

# Update cell based on row index and column name
df.update_index_at(2, 'A', 'Bottom Left')
df.update_index_at(2, 'C', 'Bottom Right')

# Update based on negative row and column indexing
df.update_index_at(-2, -2, 'Center')

# View result
print(df)

This will output cumulative result of our updates:

┌───┬─────────────┬────────┬──────────────┐
│   │ A           │ B      │ C            │
├───┼─────────────┼────────┼──────────────┤
│ 0 │ Top Left    │ ---    │ Top Right    │
│ 1 │ ---         │ Center │ ---          │
│ 2 │ Bottom Left │ ---    │ Bottom Right │
└───┴─────────────┴────────┴──────────────┘
[3 rows x 3 columns]

Important

Indexing is done using zero-based integers and not done by index value. Most of the time this distinction is irrelevant as the row index at position ‘0’ will have an index value of ‘0’, however this can change after transformation operations like filter or sort. To reset and realign the index value use SQLDataModel.reset_index() or use SQLDataModel.indicies to view the current row indicies.

Note

  • This method only updates individual cells in the current model based on integer indexing for both rows and columns using their (row, column) position.

  • To broadcast updates across row and column dimensions use the syntax of sdm[row, column] = value or see SQLDataModel.__setitem__() for more details.

Changelog:
  • Version 0.8.0 (2024-06-21):
    • Modified to allow row_index and column_index arguments the same input type flexibility found across package, allowing both to be referenced directly or by their integer index.

  • Version 0.5.2 (2024-05-13):
    • Modified row_index parameter to use SQLDataModel.indicies to index into rows in lieu of row index value equality.

  • Version 0.1.9 (2024-03-19):
    • New method.

vstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel[source]

Vertically stacks one or more SQLDataModel objects to the current model.

Parameters:
  • other (SQLDataModel or sequence of) – The SQLDataModel objects to vertically stack.

  • inplace (bool, optional) – If True, performs the vertical stacking in-place, modifying the current model. Defaults to False, returning a new SQLDataModel.

Returns:

The vertically stacked SQLDataModel instance when inplace is False.

Return type:

SQLDataModel

Raises:
  • ValueError – If no additional SQLDataModels are provided for vertical stacking.

  • TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.

  • SQLProgrammingError – If an error occurs when updating the model values in place.

Example:

import sqldatamodel as sdm

# Create models A and B
df_a = sdm.SQLDataModel([('A', 1), ('B', 2)], headers=['A1', 'A2'])
df_b = sdm.SQLDataModel([('C', 3), ('D', 4)], headers=['B1', 'B2'])

# Vertically stack B onto A
df_ab = df_a.vstack(df_b)

# View stacked model
print(df_ab)

This will output the result of stacking B onto A, using the base model columns and dtypes:

┌─────┬─────┐
│ A1  │  A2 │
├─────┼─────┤
│ A   │   1 │
│ B   │   2 │
│ C   │   3 │
│ D   │   4 │
└─────┴─────┘
[4 rows x 2 columns]

Multiple models can be stacked simultaneously, here we vertically stack 3 models:

# Create a third model C
df_c = sdm.SQLDataModel([('E', 5), ('F', 6)], headers=['C1', 'C2'])

# Vertically stack all three models
df_abc = df_a.vstack([df_b, df_c])

# View stacked result
print(df_abc)

This will output the result of stacking C and B onto A:

┌─────┬─────┐
│ A1  │  A2 │
├─────┼─────┤
│ A   │   1 │
│ B   │   2 │
│ C   │   3 │
│ D   │   4 │
│ E   │   5 │
│ F   │   6 │
└─────┴─────┘
[6 rows x 2 columns]

Note

  • Headers and data types are inherited from the model calling the SQLDataModel.vstack() method, casting stacked values corresponding to the base model types.

  • Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use SQLDataModel.concat() for strict concatenation instead of vstack.

  • See SQLDataModel.insert_row() for inserting new values or types other than SQLDataModel directly into the current model.

  • See SQLDataModel.hstack() for horizontal stacking.

Changelog:
  • Version 0.3.4 (2024-04-05):
    • New method.

where(predicate: str) SQLDataModel[source]

Filters the rows of the current SQLDataModel object based on the specified SQL predicate and returns a new SQLDataModel containing only the rows that satisfy the condition. Only the predicates are needed as the statement prepends the select clause as “select [current model columns] where [predicate]”, see below for detailed examples.

Parameters:

predicate (str) – The SQL predicate used for filtering rows that follows the ‘where’ keyword in a normal SQL statement.

Raises:
  • TypeError – If the provided predicate argument is not of type str.

  • SQLProgrammingError – If the provided string is invalid or malformed SQL when executed against the model

Returns:

A new SQLDataModel containing rows that satisfy the specified predicate.

Return type:

SQLDataModel

Example:

import sqldatamodel as sdm

# Sample data
headers = ['Name', 'Age', 'Job']
data = [
    ('Billy', 30, 'Barber'),
    ('Alice', 28, 'Doctor'),
    ('John', 25, 'Technician'),
    ('Travis', 35, 'Musician'),
    ('William', 15, 'Student')
]

# Create the model
df = sdm.SQLDataModel(data, headers)

# Filter model by 'Age' > 30
df_filtered = df.where('Age > 20')

# View result
print(df_filtered)

This will output:

┌───┬────────┬──────┬────────────┐
│   │ Name   │  Age │ Job        │
├───┼────────┼──────┼────────────┤
│ 0 │ Billy  │   30 │ Barber     │
│ 1 │ Alice  │   28 │ Doctor     │
│ 2 │ John   │   25 │ Technician │
│ 3 │ Travis │   35 │ Musician   │
└───┴────────┴──────┴────────────┘
[4 rows x 3 columns]

Filter by multiple parameters:

# Filter by 'Job' and 'Age'
df_filtered = df.where("Job = 'Student' and Age < 18")

# View result
print(df_filtered)

This will output:

┌───┬─────────┬──────┬─────────┐
│   │ Name    │  Age │ Job     │
├───┼─────────┼──────┼─────────┤
│ 4 │ William │   15 │ Student │
└───┴─────────┴──────┴─────────┘
[1 rows x 3 columns]

Note

  • predicate can be any valid SQL, for example ordering can be acheived without any filtering by simple using the argument '(1=1) order by "age" asc'

  • If predicate is not of type str, a TypeError is raised, if it is not valid SQL, SQLProgrammingError will be raised.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

ANSIColor

class sqldatamodel.ansicolor.ANSIColor(text_color: str | tuple = None, text_bold: bool = False)[source]

Bases: object

Creates an ANSI style terminal color using provided hex color or rgb values.

Variables:
  • text_color (str or tuple) – Hex color code or RGB tuple.

  • text_bold (bool) – Whether text should be bold.

Raises:
  • ValueError – If provided string is not a valid hex color code or if provided rgb tuple is invalid.

  • TypeError – If provided text_color or text_bold parameters are of invalid types.

Example:

from ANSIColor import ANSIColor

# Create a pen by specifying a color in hex or rgb:
green_bold = ANSIColor("#00ff00", text_bold=True)

# Create a string to use as a sample:
regular_str = "Hello, World!"

# Color the string using the wrap method:
green_str = green_bold.wrap(regular_str)

# Print the string in the terminal to see the color applied:
print(f"original string: {regular_str}, green string: {green_str}")

# Get rgb values from existing color
print(green_bold.to_rgb())  # Output: (0, 255, 0)
Changelog:
  • Version 0.10.2 (2024-06-30):
    • Added random color selection when initialized without a text_color argument.

    • Added dictionary of color values at ANSIColor.Colors to use as selection pool.

    • Modified ANSIColor.__repr__() to always return hex value as a string for consistency regardless of original input format.

__init__(text_color: str | tuple = None, text_bold: bool = False) None[source]

Initializes the ANSIColor object with the specified text color and bold setting, referred to as the ‘pen’ throughout documentation.

Parameters:
  • text_color (str or tuple) – Hex color code or RGB tuple. If not provided, a random color will be selected.

  • text_bold (bool) – Whether text should be bold (default: False)

Example:

from ANSIColor import ANSIColor

# Initialize from hex value with normal weight
color = ANSIColor("#00ff00")

# Initialize from rgb value with bold weight
color = ANSIColor((0,255,0), text_bold=True)

# Surprise me! Initialize pen from random color
color = ANSIColor()
Changelog:
  • Version 0.10.2 (2024-06-30):
    • Modified to randomly select a color from ANSIColor.Colors when text_color = None for demonstration purposes.

Note

__repr__() str[source]

The string representation used for instances of ANSIColor displayed with the pen set at ANSIColor.text_color_str formatted to allow object recreation.

Returns:

The string representation as ANSIColor('hexvalue') colored with the ANSI terminal color

Return type:

str

Example:

from ANSIColor import ANSIColor

# Create the pen from a hex value
color = ANSIColor('#EFAC65')

# View representation
print(color)

This will output:

ANSIColor('#EFAC65')

Creating a pen using the equivalent RGB tuple results in the same output:

# From the RGB equivalent values
color = ANSIColor((239, 172, 101))

# View representation
print(color)

This will also output:

ANSIColor('#EFAC65')

Note

  • The representation will always be formatted using the hex value for consistency and recreation.

  • Use ANSIColor.to_rgb() to view the RGB values for an existing pen.

classmethod rand_color() ANSIColor[source]

Create a new ANSIColor pen by randomly selecting one from a preexisting pool of options.

Returns:

A new ANSIColor instance created using a randomly selected color.

Return type:

ANSIColor

Example:

from ANSIColor import ANSIColor

# Surprise me!
rand_color = ANSIColor.rand_color()

# See what we got
print(rand_color)

We got a nice orance color with this hex value:

ANSIColor('#F89F1F')

Note

  • See ANSIColor.Colors for dictionary of values being used as random color selection pool.

Changelog:
  • Version 0.10.2 (2024-06-30):
    • Added to allow a random color to be selected for sqldatamodel.SQLDataModel.set_display_color()

    • New method.

text_color_hex[source]

The hex value of color uppercased and prepended with ‘#’ to reflect hexadecimal format ranging from '#000000' to '#FFFFFF'.

Type:

str

text_color_rgb[source]

The RGB value of the color as a tuple of integers reflecting the (red, green, blue) values satisfying 0 <= value <= 255.

Type:

tuple[int, int, int]

text_color_str[source]

The input color used to create the pen in the originally provided format.

Type:

str

to_rgb() tuple[source]

Returns the text color attribute as a tuple in the format (r, g, b).

Returns:

RGB tuple.

Return type:

tuple

Example:

from ANSIColor import ANSIColor

# Create the color
color = ANSIColor("#00ff00")

# Get the rgb values
print(color.to_rgb())  # Output: (0, 255, 0)
wrap(text: str) str[source]

Wraps the provided text in the style of the pen.

Parameters:

text (str) – Text to be wrapped.

Returns:

Wrapped text with ANSI escape codes.

Return type:

str

Example:

from ANSIColor import ANSIColor

# Create the color
blue_color = ANSIColor("#0000ff")

# Create a sample string
message = "This string is currently unstyled"

# Wrap the string to change its styling whenever its printed
blue_message = blue_color.wrap(message)

# Print the styled message
print(blue_message)

# Or style string or string object directly in the print statement
print(blue_color.wrap("I'm going to turn blue!"))

HTMLParser

class sqldatamodel.htmlparser.HTMLParser(*, convert_charrefs: bool = True, cell_sep: str = ' ', table_identifier: int | str = 1)[source]

Bases: HTMLParser

An HTML parser designed to extract tables from HTML content.

This parser subclasses HTMLParser from the standard library to parse HTML content. It extracts tables from the HTML and provides methods to access the table data.

Variables:
  • convert_charrefs (bool) – Flag indicating whether to convert character references to Unicode characters. Default is True.

  • cell_sep (str) – Separator string to separate cells within a row. Default is an empty string.

  • table_identifier (int or str) – Identifier used to locate the target table. It can be either an integer representing the table index, or a string representing the HTML ‘name’ or ‘id’ attribute of the table.

  • _in_td (bool) – Internal flag indicating whether the parser is currently inside a <td> tag.

  • _in_th (bool) – Internal flag indicating whether the parser is currently inside a <th> tag.

  • _current_table (list) – List to hold the current table being parsed.

  • _current_row (list) – List to hold the current row being parsed.

  • _current_cell (list) – List to hold the current cell being parsed.

  • _ignore_next (bool) – Internal flag indicating whether the next token should be ignored.

  • found_target (bool) – Flag indicating whether the target table has been found.

  • _is_finished (bool) – Internal flag indicating whether parsing is finished.

  • table_counter (int) – Counter to keep track of the number of tables encountered during parsing.

  • target_table (list) – List to hold the data of the target table once found.

Change Log:
  • Version 0.9.0 (2024-06-26):
    • Modified integer indexing of table elements found to use one-based indexing instead of zero-based indexing to align with similar method usage across package.

handle_data(data: str) None[source]

Handle the data within an HTML tag during parsing.

Parameters:

data (str) – The data contained within the HTML tag.

handle_endtag(tag: str) None[source]

Handle the end of an HTML tag during parsing and modify the parsing tags accordingly.

Parameters:

tag (str) – The name of the HTML tag encountered.

handle_starttag(tag: str, attrs: list[str]) None[source]

Handle the start of an HTML tag during parsing.

Parameters:
  • tag (str) – The name of the HTML tag encountered.

  • attrs (list[str]) – A list of (name, value) pairs representing the attributes of the tag.

validate_table() None[source]

Validate and retrieve the target HTML table data based on table_identifier used for parsing.

Returns:

A tuple containing the table data and headers (if present).

Return type:

tuple[list, list|None]

Raises:

ValueError – If the target table is not found or cannot be parsed.

Note

  • SQLDataModel.from_html() uses this class to extract valid HTML tables from either web or file content.

  • If a row is found with mismatched dimensions, it will be filled with None values to ensure tabular output.

JSONEncoder

class sqldatamodel.jsonencoder.DataTypesEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

Custom JSON encoder that extends the functionality of json.JSONEncoder to handle additional data types.

Serialization:
  • datetime.date: Serialized as a string in the format ‘YYYY-MM-DD’.

  • datetime.datetime: Serialized as a string in the format ‘YYYY-MM-DD HH:MM:SS’.

  • bytes: Decoded to a UTF-8 encoded string.

Note

  • The date and datetime types can be recovered using SQLDataModel.infer_dtypes() method.

  • The bytes information is not decoded back into bytes.

default(obj: Any)[source]

Override the default method to provide custom serialization for specific data types.

Parameters:

obj – The Python object to be serialized.

Returns:

The JSON-serializable representation of the object.

StandardDeviation

class sqldatamodel.standarddeviation.StandardDeviation[source]

Bases: object

Implementation of standard deviation as an aggregate function for SQLite:

\[\sigma = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}\]
Where:
  • \(x_i\) represents each individual data point in the population.

  • \(\mu\) is the population mean.

  • \(N\) is the total number of data points in the population.

This class provides methods to calculate the standard deviation of a set of values in an SQLite query using the aggregate function mechanism.

Variables:
  • M (float) – The running mean of the values.

  • S (float) – The running sum of the squared differences from the mean.

  • k (int) – The count of non-null input values.

Note

finalize() float[source]

Compute the final standard deviation as part of sqlite3 user-defined aggregate function.

Returns:

The computed standard deviation if the count is greater than or equal to 3, else None.

Return type:

float or None

Note

  • This returns the population standard deviation, not sample standard deviation. It measures of the spread or dispersion of a set of data points within the population, using the entire population.

step(value)[source]

Update the running mean and sum of squared differences with a new value.

Parameters:

value (float) – The input value to be included in the calculation.

Note

  • If the input value is None, it will be ignored.

utils

sqldatamodel.utils._create_connection(url: str) Connection | Any[source]

Parses database connection url into component parameters and creates the specified connection.

Parameters:

url (str) – The url connection string provided in the format of 'scheme://user:pass@host:port/path'

Raises:
  • ValueError – If scheme is provided and not one of the currently supported driver formats.

  • ModuleNotFoundError – If required driver for specified scheme is not installed or not found.

Returns:

The driver connection object for the scheme specified.

Return type:

Connection (sqlite3.Connection | Any)

Supported Formats:
  • SQLite using sqlite3 with format 'file:///path/to/database.db'

  • PostgreSQL using psycopg2 with format 'postgresql://user:pass@hostname:port/db'

  • SQL Server ODBC using pyodbc with format 'mssql://user:pass@hostname:port/db'

  • Oracle using cx_Oracle with format 'oracle://user:pass@hostname:port/db'

  • Teradata using teradatasql with format 'teradata://user:pass@hostname:port/db'

Examples:

SQLite

import sqldatamodel as sdm

# SQLite connection url
url = 'file:///home/database/users.db'

# Parse and create sqlite3 connection
conn = sdm.SQLDataModel._create_connection(url)

PostgreSQL

import sqldatamodel as sdm

# Sample url
url = 'postgresql://scott:tiger@12.34.56.78:5432/pgdb'

# Parse and create psycopg2 connection
conn = sdm.SQLDataModel._create_connection(url)

Note

  • Used by SQLDataModel.from_sql() and SQLDataModel.to_sql() to parse and create connection objects from url.

  • See SQLDataModel._parse_connection_url() for implementation on parsing url properties from connection string.

Changelog:
  • Version 0.9.2 (2024-06-27):
    • New method.

sqldatamodel.utils._parse_connection_url(url: str) NamedTuple[source]

Parses database connection url into component parameters and returns the parsed components as a NamedTuple

Parameters:

url (str) – The url connection string provided in the format of 'scheme://user:pass@host:port/path'

Raises:
  • AttributeError – If url provided could not be parsed into expected component properties.

  • ValueError – If scheme is not provided or is not one of the currently supported driver formats or module aliases below SQLite: 'file' or 'sqlite3' PostgreSQL: 'postgresql' or 'psycopg2' SQL Server ODBC: 'mssql' or 'pyodbc' Oracle: 'oracle' or 'cx_oracle' Teradata: 'teradata' or 'teradatasql'

Returns:

The parsed details as ConnectionDetails('scheme', 'user', 'cred', 'host', 'port', 'db')

Return type:

ConnectionDetails

Supported Formats:
  • SQLite using sqlite3 with format 'file:///path/to/database.db'

  • PostgreSQL using psycopg2 with format 'postgresql://user:pass@hostname:port/db'

  • SQL Server ODBC using pyodbc with format 'mssql://user:pass@hostname:port/db'

  • Oracle using cx_Oracle with format 'oracle://user:pass@hostname:port/db'

  • Teradata using teradatasql with format 'teradata://user:pass@hostname:port/db'

Example:

import sqldatamodel as sdm

# SQLite connection url
url = 'file:///home/database/users.db'

# Parse the connection properties
url_props = sdm.SQLDataModel._parse_connection_url(url)

# View attributes
print(url_props)

This will output the connection details for a local SQLite database file:

ConnectionDetails(
    scheme='file', user=None, cred=None, host=None, port=None, db='/home/database/users.db'
)

PostgreSQL connections can be parsed from a valid format:

import sqldatamodel as sdm

# PostgreSQL connection url
url = 'postgresql://scott:tiger@12.34.56.78:5432/pgdb'

# Parse the connection properties
url_props = sdm.SQLDataModel._parse_connection_url(url)

# View attributes
print(url_props)

This will output the connection details for a PostgreSQL connection:

ConnectionDetails(
    scheme='postgresql', user='scott', cred='tiger', host='12.34.56.78', port=5432, db='pgdb'
)

Note

  • This method is used by SQLDataModel._create_connection() to parse details from url and create a connection object.

  • This method can be used by SQLDataModel.from_sql() and SQLDataModel.to_sql() to parsed connection details when connection parameter provided as string.

Changelog:
  • Version 0.9.3 (2024-06-28):
    • Modified behavior when scheme is not provided, treating as file path when parsed in absence of auth related properties to retain prior version behavior of creating new sqlite3 database file when path is provided.

    • Added driver module names as valid aliases for relevant connection drivers, valid schemes now include ‘file’, ‘sqlite3’, ‘postgresql’, ‘psycopg2’, ‘mssql’, ‘pyodbc’, ‘oracle’, ‘cx_oracle’, ‘teradata’, ‘teradatasql’

  • Version 0.9.2 (2024-06-27):
    • Modified to use urllib.parse.urlparse instead of added 3rd party package dependency.

  • Version 0.9.1 (2024-06-27):
    • New method.

sqldatamodel.utils.alias_duplicates(headers: list) Generator[source]

Rename duplicate column names in a given list by appending an underscore and a numerical suffix.

Parameters:

headers (list) – A list of column names that require parsing for duplicates.

Yields:

Generator – A generator object that yields the original or modified column names.

Example:

import sqldatamodel as sdm

# Original list of column names with duplicates
original_headers = ['ID', 'Name', 'Amount', 'Name', 'Date', 'Amount']

# Use the static method to return a generator for the duplicates
renamed_generator = sdm.SQLDataModel.alias_duplicates(original_headers)

# Obtain the modified column names
modified_headers = list(renamed_generator)

# View modified column names
print(modified_headers)

# Output
modified_headers = ['ID', 'Name', 'Amount', 'Name_2', 'Date', 'Amount_2']

Example of implementation for SQLDataModel:

# Given a list of headers
original_headers = ['ID', 'ID', 'Name', 'Name', 'Name', 'Unique']

# Create a separate list for aliasing duplicates
aliased_headers = list(SQLDataModel.alias_duplicates(original_headers))

# View aliases
for col, alias in zip(original_headers, aliased_headers):
    print(f"{col} as {alias}")

This will output:

ID as ID
ID as ID_2
Name as Name
Name as Name_2
Name as Name_3
Unique as Unique

Note

  • Used by SQLDataModel.execute_fetch() when column selection is unknown and may require duplicate aliasing.

Changelog:
  • Version 0.3.4 (2024-04-05):
    • Modified to re-alias partially aliased input to prevent runaway incrementation on suffixes.

  • Version 0.1.9 (2024-03-19):
    • New method.

sqldatamodel.utils.flatten_json(json_source: list | dict, flatten_rows: bool = True, level_sep: str = '_', key_prefix: str = None) dict[source]

Parses raw JSON data and flattens it into a dictionary with optional normalization.

Parameters:
  • json_source (dict | list) – The raw JSON data to be parsed.

  • flatten_rows (bool) – If True, the data will be normalized into columns and rows. If False, columns will be concatenated from each row using the specified key_prefix.

  • level_sep (str) – Separates nested levels from other levels and used to concatenate prefix to column.

  • key_prefix (str) – The prefix to prepend to the JSON keys. If None, an empty string is used.

Returns:

A flattened dictionary representing the parsed JSON data.

Return type:

dict

Example:

import sqldatamodel as sdm

# Sample JSON
json_source = [{
    "alpha": "A",
    "value": 1
},
{
    "alpha": "B",
    "value": 2
},
{
    "alpha": "C",
    "value": 3
}]

# Flatten JSON with normalization
flattened_data = sdm.SQLDataModel.flatten_json(json_data, flatten_rows=True)

# Format of result
flattened_data = {"alpha": ['A','B','C'], "value": [1, 2, 3]}

# Alternatively, flatten columns without rows and adding a prefix
flattened_data = sdm.SQLDataModel.flatten_json(raw_input,key_prefix='row_',flatten_rows=False)

# Format of result
flattened_data = {'row_0_alpha': 'A', 'row_0_value': 1, 'row_1_alpha': 'B', 'row_1_value': 2, 'row_2_alpha': 'C', 'row_2_value': 3}

Note

  • Used by SQLDataModel.from_dict() to flatten deeply nested JSON objects into 2 dimensions when encountered.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

sqldatamodel.utils.generate_html_table_chunks(html_source: str) Generator[str, None, None][source]

Generate chunks of HTML content for all <table> elements found in provided source as complete and unbroken chunks for parsing.

Parameters:

html_source (str) – The raw HTML content from which to generate chunks.

Raises:

ValueError – If zero <table> elements were found in html_source provided.

Yields:

str – Chunks of HTML content containing complete <table> elements.

Example:

import sqldatamodel as sdm

# HTML content to chunk
html_source = '''
<html>
    <table><tr><td>Table 1</td></tr></table>
    ...
    <p>Non-table elements</p>
    ...
    <table><tr><td>Table 2</td></tr></table>
</html>
'''

# Generate and view the returned table chunks
for chunk in sdm.SQLDataModel.generate_html_table_chunks(html_source):
    print('Chunk:', chunk)

This will output:

Chunk: <table><tr><td>Table 1</td></tr></table>
Chunk: <table><tr><td>Table 2</td></tr></table>

Note

  • HTML content before the first <table> element and after the last </table> element is ignored and not yielded.

  • See SQLDataModel.from_html() for full implementation and how this function is used for HTML parsing.

Changelog:
  • Version 0.2.1 (2024-03-24):
    • New method.

sqldatamodel.utils.infer_str_type(obj: str, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') str[source]

Infer the data type of the input object.

Parameters:
  • obj (str) – The object for which the data type is to be inferred.

  • date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.

  • datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.

Returns:

The inferred data type.

Return type:

str

Inference:
  • 'str': If the input object is a string, or cannot be parsed as another data type.

  • 'date': If the input object represents a date without time information.

  • 'datetime': If the input object represents a datetime with both date and time information.

  • 'int': If the input object represents an integer.

  • 'float': If the input object represents a floating-point number.

  • 'bool': If the input object represents a boolean value.

  • 'bytes': If the input object represents a binary array.

  • 'None': If the input object is None, empty, or not a string.

Note

  • This method attempts to infer the data type of the input object by evaluating its content.

  • If the input object is a string, it is parsed to determine whether it represents a date, datetime, integer, or float.

  • If the input object is not a string or cannot be parsed, its type is determined based on its Python type (bool, int, float, bytes, or None).

Changelog:
  • Version 2.3.2 (2026-01-23):
    • Modified check for possible to bytes to include lower cased prefix format of x’<BYTES>’

  • Version 2.3.0 (2026-01-21):
    • Added additional check for possible bytes data if obj matches format of `X’<BYTES>’ where bytes are valid hexadecimal format.

  • Version 0.1.9 (2024-03-19):
    • New method.

sqldatamodel.utils.infer_types_from_data(input_data: list[list], date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') list[str][source]

Infer the best types of input_data by using a simple presence-based voting scheme. Sampling is assumed prior to function call, treating input_data as already a sampled subset from the original data.

Parameters:
  • input_data (list[list]) – A list of lists containing the input data.

  • date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.

  • datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.

Returns:

A list representing the best-matching inferred types for each column based on the sampled data.

Return type:

list

Note

  • If multiple types are present in the samples, the most appropriate type is inferred based on certain rules.

  • If a column contains both date and datetime instances, the type is inferred as datetime.

  • If a column contains both int and float instances, the type is inferred as float.

  • If a column contains only str instances or multiple types with no clear choice, the type remains as str.

  • See SQLDataModel.infer_str_type() for type determination process.

Changelog:
  • Version 0.1.9 (2024-03-19):
    • New method.

sqldatamodel.utils.sqlite_cast_type_format(param: str = '?', dtype: Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool'] = 'str', as_binding: bool = True, as_alias: bool = False)[source]

Formats the specified param to be cast consistently into the python type specified for insert params or as a named alias param.

Parameters:
  • param (str) – The parameter to be formatted.

  • dtype (Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool']) – The python data type of the parameter as a string.

  • as_binding (bool, optional) – Whether to format as a binding parameter (default is True).

  • as_alias (bool, optional) – Whether to include an alias for the parameter (default is False).

Returns:

The parameter formatted for SQL type casting.

Return type:

str

Note

  • This function provides consistent formatting for casting parameters into specific data types for SQLite, changing it will lead to unexpected behaviors.

  • Used by SQLDataModel.__init__() with as_binding=True to allow parameterized inserts to cast to appropriate data type.

Changelog:
  • Version 2.3.2 (2026-01-23):
    • Modified handling for bytes dtype further to handle parsing values of mixed ASCII encoded bytes and escaped hexadecimal bytes.

  • Version 2.3.0 (2026-01-21):
    • Added support for alternate bytes format X’<BYTES>’ so that binary data is correctly handled when formatted as hexadecimal and prefixed by either ‘b’ or ‘X’.

  • Version 0.7.6 (2024-06-16):
    • Added support for additional date formats when dtype='date' including: '%m/%d/%Y', '%m-%d-%Y', '%m.%d.%Y', '%Y/%m/%d', '%Y-%m-%d', '%Y.%m.%d'.

    • Modified behavior when dtype='bytes' to avoid the need for any additional checks after insert.

  • Version 0.3.3 (2024-04-03):
    • New method.

sqldatamodel.utils.sqlite_printf_format(column: str, dtype: str, max_pad_width: int, float_precision: int = 4, alignment: str = None, escape_newline: bool = False, truncation_chars: str = '⠤⠄') str[source]

Formats SQLite SELECT clauses based on column parameters to provide preformatted fetches, providing most of the formatting for repr output.

Parameters:
  • column (str) – The name of the column.

  • dtype (str) – The data type of the column (‘float’, ‘int’, ‘bytes’, ‘index’, or ‘custom’).

  • max_pad_width (int) – The maximum width to pad the output.

  • float_precision (int, optional) – The precision for floating-point numbers (default is 4).

  • alignment (str, optional) – The alignment of the output (‘<’, ‘>’, or None for no alignment).

  • escape_newline (bool, optional) – If newline characters should be escaped when dtype = 'str'. Default is False.

  • truncation_chars (str, optional) – Truncation characters to use if column exceeds maximum width. Default is '⠤⠄'.

Returns:

The formatted SELECT clause for SQLite.

Return type:

str

Note

  • This function generates SQLite SELECT clauses for single column only.

  • The output preformats SELECT result to fit repr method for tabular output.

  • The return str is not valid SQL by itself, representing only the single column select portion.

Changelog:
  • Version 2.3.0 (2026-01-21):
    • Modified handling for bytes dtype to more closely align with conventional representation of X’<BYTES>’ instead of previous approach.

  • Version 0.11.0 (2024-07-05):
    • Added truncation_chars keyword argument to allow custom truncation characters when column value exceeds maximum width.

  • Version 0.10.4 (2024-07-03):
    • Added escape_newline keyword argument to escape newline characters to prevent wrapping lines when called by SQLDataModel.__repr__()

  • Version 0.7.0 (2024-06-08):
    • Added preemptive check for custom flag to pass through string formatting directly to support horizontally centered repr changes.

  • Version 0.1.9 (2024-03-19):
    • New method.