Modules & Documentation

SQLDataModel

class SQLDataModel.SQLDataModel.SQLDataModel(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = 'default')

Bases: object

SQLDataModel

Primary class for the package of the same name. Its meant to provide a fast & light-weight alternative to the common pandas, numpy and sqlalchemy setup for moving data in a source/destination agnostic manner. It is not an ORM, any modifications outside of basic joins, group bys and table alteration requires knowledge of SQL. The primary use-case envisaged by the package is one where a table needs to be ETL’d from location A to destination B with arbitrary modifications made if needed:

Summary

  • Extract your data from SQL, websites or HTML, parquet, JSON, CSV, pandas, numpy, pickle, python dictionaries, lists, etc.

  • Transform your data using raw SQL or any number of built-in methods covering some of the most used pandas data methods.

  • Load your data to any number of sources including popular SQL databases, CSV files, JSON, HTML, parquet, pickle, etc.

Usage

from SQLDataModel import SQLDataModel

# Lets grab a random table from Wikipedia
sdm = SQLDataModel.from_html("https://en.wikipedia.org/wiki/FIFA_World_Cup", table_identifier=6)

# Lets see what we found
print(sdm)

This will output:

┌───────────────┬──────┬──────┬──────────┬──────────┬──────┬──────┬───────┐
│ Confederation │  AFC │  CAF │ CONCACAF │ CONMEBOL │  OFC │ UEFA │ Total │
├───────────────┼──────┼──────┼──────────┼──────────┼──────┼──────┼───────┤
│ Teams         │   43 │   49 │       46 │       89 │    4 │  258 │   489 │
│ Top 16        │    9 │   11 │       15 │       37 │    1 │   99 │   172 │
│ Top 8         │    2 │    4 │        5 │       36 │    0 │  105 │   152 │
│ Top 4         │    1 │    1 │        1 │       23 │    0 │   62 │    88 │
│ Top 2         │    0 │    0 │        0 │       15 │    0 │   29 │    44 │
│ 4th           │    1 │    1 │        0 │        5 │    0 │   15 │    22 │
│ 3rd           │    0 │    0 │        1 │        3 │    0 │   18 │    22 │
│ 2nd           │    0 │    0 │        0 │        5 │    0 │   17 │    22 │
│ 1st           │    0 │    0 │        0 │       10 │    0 │   12 │    22 │
└───────────────┴──────┴──────┴──────────┴──────────┴──────┴──────┴───────┘
[9 rows x 8 columns]

Example:

from SQLDataModel import SQLDataModel

# For example, setup a source connection
source_db_conn = pyodbc.connect(...)

# A destination connection
destination_db_conn = sqlite3.connect(...)

# Grab your source table
sdm = SQLDataModel.from_sql("select * from source_table", source_db_conn)

# Modify it however you want, whether through plain SQL
sdm = sdm.execute_fetch('select "whatever", "i", "want" from "wherever_i_want" where "what_i_need" is not null ')

# Or through any number of built-in methods like filtering
sdm = sdm[sdm['create_date'] >= '2023-01-01']

# Or creating new columns
sdm['new_date'] = datetime.now()

# Or modifying existing ones
sdm['salary'] = sdm['salary'] * 2

# Or applying functions
sdm['user_id'] = sdm['user_id'].apply(lambda x: x**2)

# Or deduplicating
sdm = sdm.deduplicate(subset=['user_id','user_name'])

# Or iterate through it row-by-row and modify it
for idx, row in sdm.iter_tuples(index=True):
    if row['number'] % 2 == 0:
        row[idx,'odd_even'] = 'even'
    else:
        row[idx,'odd_even'] = 'odd'

# Or join it using any of the standard join operations
sdm = sdm_left.merge(sdm_right, how='left', left_on='id', right_on='id')

# Or group or aggregate the data:
sdm_agg = sdm.group_by(["first", "last", "position"])

# Or have your data imported and described for you
sdm = SQLDataModel.from_parquet('titanic.parquet').describe()

# View result
print(sdm)

This will output:

┌────────┬─────────────┬──────────┬────────┬────────┬───────┬────────┐
│ metric │ passengerid │ survived │ pclass │    sex │   age │   fare │
├────────┼─────────────┼──────────┼────────┼────────┼───────┼────────┤
│ count  │         891 │      891 │    891 │    891 │   714 │    891 │
│ unique │         891 │        2 │      3 │      2 │    88 │    248 │
│ top    │         891 │        0 │      3 │   male │    24 │   8.05 │
│ freq   │           1 │      549 │    491 │    577 │    30 │     43 │
│ mean   │         446 │        0 │      2 │    NaN │  29.7 │   32.2 │
│ std    │         257 │        0 │      0 │    NaN │ 14.53 │  49.69 │
│ min    │           1 │        0 │      1 │ female │  0.42 │      0 │
│ p25    │         223 │        0 │      2 │    NaN │     6 │    7.9 │
│ p50    │         446 │        0 │      3 │    NaN │    24 │  14.45 │
│ p75    │         669 │        1 │      3 │    NaN │    35 │     31 │
│ max    │         891 │        1 │      3 │   male │    80 │ 512.33 │
│ dtype  │         int │      int │    int │    str │ float │  float │
└────────┴─────────────┴──────────┴────────┴────────┴───────┴────────┘
[12 rows x 7 columns]

Move data quickly from one source or format to another:

# Load it to your destination database:
sdm.to_sql("new_table", destination_db_conn)

# Or any number of formats including:
sdm.to_csv("output.csv")
sdm.to_html("output.html")
sdm.to_json("output.json")
sdm.to_latex("output.tex")
sdm.to_markdown("output.md")
sdm.to_parquet("output.parquet")
sdm.to_pickle("output.sdm")
sdm.to_text("output.txt")
sdm.to_local_db("output.db")

# Reload it back again from more formats:
sdm = SQLDataModel.from_csv("output.csv")
sdm = SQLDataModel.from_dict(py_dict)
sdm = SQLDataModel.from_html("output.html")
sdm = SQLDataModel.from_json("output.json")
sdm = SQLDataModel.from_latex("output.tex")
sdm = SQLDataModel.from_markdown("output.md")
sdm = SQLDataModel.from_numpy(np_arr)
sdm = SQLDataModel.from_pandas(pd_df)
sdm = SQLDataModel.from_polars(pl_df)
sdm = SQLDataModel.from_parquet("output.parquet")
sdm = SQLDataModel.from_pickle("output.sdm")
sdm = SQLDataModel.from_sql("output", sqlite3.connect('output.db'))

Data Formats

SQLDataModel seamlessly interacts with a wide range of data formats providing a versatile platform for data extraction, conversion, and writing. Supported formats include:

  • Arrow: Convert to and from Apache Arrow format, pyarrow required.

  • CSV: Extract from and write to comma separated value, .csv, files.

  • Excel: Extract from and write to Excel .xlsx files, openpyxl required.

  • HTML: Extract from web and write to and from .html files including formatted string literals.

  • JSON: Extract from and write to .json files, JSON-like objects, or JSON formatted sring literals.

  • LaTeX: Extract from and write to .tex files, LaTeX formatted string literals.

  • Markdown: Extract from and write to .MD files, Markdown formatted string literals.

  • Numpy: Convert to and from numpy.ndarray objects, numpy required.

  • Pandas: Convert to and from pandas.DataFrame objects, pandas required.

  • Parquet: Extract from and write to .parquet files, pyarrow required.

  • Pickle: Extract from and write to .pkl files, package uses .sdm extension when pickling for SQLDataModel metadata.

  • Polars: Convert to and from polars.DataFrame objects, polars required.

  • SQL: Extract from and write to the following popular SQL databases:

    • SQLite: Using the built-in sqlite3 module.

    • PostgreSQL: Using the psycopg2 package.

    • SQL Server: Using the pyodbc package.

    • Oracle: Using the cx_Oracle package.

    • Teradata: Using the teradatasql package.

  • Text: Write to and from .txt files including other SQLDataModel string representations.

  • TSV or delimited: Write to and from files delimited by:

    • \t: Tab separated values or .tsv files.

    • \s: Single space or whitespace separated values.

    • ;: Semicolon separated values.

    • |: Pipe separated values.

    • :: Colon separated values.

    • ,: Comma separated values or .csv files.

  • Python objects:

    • dictionaries: Convert to and from collections of python dict objects.

    • lists: Convert to and from collections of python list objects.

    • tuples: Convert to and from collections of python tuple objects.

    • namedtuples: Convert to and from collections of namedtuples objects.

Pretty Printing

SQLDataModel also pretty prints your table in any color you specify, use SQLDataModel.set_display_color() and provide either a hex value or a tuple of rgb and print the table, example output:

┌───┬─────────────────────┬────────────┬─────────────┬────────┬─────────┐
│   │ full_name           │ date       │ country     │    pin │ service │
├───┼─────────────────────┼────────────┼─────────────┼────────┼─────────┤
│ 0 │ Pamela Berg         │ 2024-09-15 │ New Zealand │   3010 │    3.02 │
│ 1 │ Mason Hoover        │ 2024-01-23 │ Australia   │   6816 │    5.01 │
│ 2 │ Veda Suarez         │ 2023-09-04 │ Ukraine     │   1175 │    4.65 │
│ 3 │ Guinevere Cleveland │ 2024-03-22 │ New Zealand │   4962 │    3.81 │
│ 4 │ Vincent Mccoy       │ 2023-09-16 │ France      │   4446 │    2.95 │
│ 5 │ Holmes Kemp         │ 2024-11-13 │ Germany     │   9396 │    4.61 │
│ 6 │ Donna Mays          │ 2023-06-06 │ Costa Rica  │   8153 │    5.34 │
│ 7 │ Rama Galloway       │ 2023-09-22 │ Italy       │   3384 │    3.87 │
│ 8 │ Lucas Rodriquez     │ 2024-03-16 │ New Zealand │   3278 │    2.73 │
│ 9 │ Hunter Donaldson    │ 2023-06-30 │ Belgium     │   1593 │    4.58 │
└───┴─────────────────────┴────────────┴─────────────┴────────┴─────────┘

Note

  • No additional dependencies are installed with this package, however you will obviously need to have pandas or numpy to create pandas or numpy objects.

  • Use SQLDataModel.set_display_color() to modify the terminal color of the table, by default no color styling is applied.

  • Use SQLDataModel.get_supported_sql_connections() to view supported SQL connection packages, please reach out with any issues or questions, thanks!

static ErrorFormat(error: str) str

Formats an error message with ANSI color coding.

Parameters:

error – The error message to be formatted.

Returns:

The modified string with ANSI color coding, highlighting the error type in bold red.

Return type:

str

Example:

# Format the error message
formatted_error = SQLDataModel.ErrorFormat("ValueError: Invalid value provided.")

# Should display a colored string to display along with error or exception
print(formatted_error)
static SuccessFormat(success: str) str

Formats a success message with ANSI color coding.

Parameters:

success – The success message to be formatted.

Returns:

The modified string with ANSI color coding, highlighting the success source in bold green.

Return type:

str

Example:

# Message to format
formatted_success = SuccessFormat("FileCreated: The operation was successful with new file created.")

# Styled message to pass with error or exception
print(formatted_success)
static WarnFormat(warn: str) str

Formats a warning message with ANSI color coding.

Parameters:

warn – The warning message to be formatted.

Returns:

The modified string with ANSI color coding, highlighting the class name in bold yellow.

Return type:

str

Example:

# Warning to format
formatted_warning = WarnFormat("DeprecationWarning: This method is deprecated.")

# Styled message to pass with error or exception
print(formatted_warning)
__add__(value: str | int | float | SQLDataModel) SQLDataModel

Implements the + operator functionality for compatible SQLDataModel operations.

Parameters:

value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.

Raises:
  • TypeError – If the provided value is not a valid type (str, int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as addition) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the addition operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
sdm = SQLDataModel(data, headers)

# Perform scalar addition
sdm['x + 100'] = sdm['x'] + 100

# Perform vector addition using another column
sdm['x + y'] = sdm['x'] + sdm['y']

# View both results
print(sdm)

This will output:

┌─────┬─────┬─────────┬───────┐
│   x │   y │ x + 100 │ x + y │
├─────┼─────┼─────────┼───────┤
│   2 │  10 │     102 │    12 │
│   4 │  20 │     104 │    24 │
│   8 │  30 │     108 │    38 │
│  16 │  40 │     116 │    56 │
│  32 │  50 │     132 │    82 │
└─────┴─────┴─────────┴───────┘
[5 rows x 4 columns]

We can also use addition to concatenate strings:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['First', 'Last']
data = [['Alice', 'Smith'],['Bob', 'Johnson'],['Charlie', 'Hall'],['David', 'Brown']]

# Create the model
sdm = SQLDataModel(data, headers)

# Concatenate scalar character
sdm['Loud First'] = sdm['First'] + '!'

# Concatenate scalar and vector using existing columns
sdm['Full Name'] = sdm['First'] + ' ' + sdm['Last']

# View it
print(sdm)

This will output:

┌─────────┬─────────┬────────────┬──────────────┐
│ First   │ Last    │ Loud First │ Full Name    │
├─────────┼─────────┼────────────┼──────────────┤
│ Alice   │ Smith   │ Alice!     │ Alice Smith  │
│ Bob     │ Johnson │ Bob!       │ Bob Johnson  │
│ Charlie │ Hall    │ Charlie!   │ Charlie Hall │
│ David   │ Brown   │ David!     │ David Brown  │
└─────────┴─────────┴────────────┴──────────────┘
[4 rows x 4 columns]

Note

  • Mixing summands such as int + float will work, however an exception will be raised when attempting to perform addition on incompatible types such as str + float.

__eq__(other) SQLDataModel

Implements the is equal to operator == for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

A new SQLDataModel containing the result rows of the operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter by 'gender' column
sdm = sdm[sdm['gender'] == 'Female']

# View result
print(sdm)

This will output:

┌───┬───────┬──────┬──────┬─────────┬────────────┬────────┐
│   │ first │ last │  age │ service │ hire_date  │ gender │
├───┼───────┼──────┼──────┼─────────┼────────────┼────────┤
│ 0 │ Kelly │ Lee  │   32 │    8.00 │ 2016-09-18 │ Female │
│ 1 │ Sarah │ West │   51 │    0.70 │ 2023-10-01 │ Female │
└───┴───────┴──────┴──────┴─────────┴────────────┴────────┘
[2 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__floordiv__(value: int | float | SQLDataModel) SQLDataModel

Implements the // operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the floor division operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
sdm = SQLDataModel(data, headers)

# Perform scalar floor division
sdm['y // 10'] = sdm['y'] // 10

# Perform vector floor division using another column
sdm['y // x'] = sdm['y'] // sdm['x']

# View both results
print(sdm)

This will output:

┌─────┬─────┬─────────┬────────┐
│   x │   y │ y // 10 │ y // x │
├─────┼─────┼─────────┼────────┤
│   2 │  10 │       1 │      5 │
│   4 │  20 │       2 │      5 │
│   8 │  30 │       3 │      3 │
│  16 │  40 │       4 │      2 │
│  32 │  50 │       5 │      1 │
└─────┴─────┴─────────┴────────┘
[5 rows x 4 columns]

Note

  • Mixing divisor types such as int // float will work, however an exception will be raised when attempting to perform division on incompatible types such as str // float.

__ge__(other) SQLDataModel

Implements the greater than or equal to operator >= for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

A new SQLDataModel containing the result rows of the operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter by 'hire_date' column
sdm = sdm[sdm['hire_date'] >= datetime.date(2020,1,1)]

# View result
print(sdm)

This will output:

┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐
│   │ first │ last   │  age │ service │ hire_date  │ gender │
├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith  │   27 │    1.22 │ 2023-02-01 │ Male   │
│ 1 │ Mike  │ Harlin │   36 │    3.90 │ 2020-08-27 │ Male   │
│ 2 │ Sarah │ West   │   51 │    0.70 │ 2023-10-01 │ Female │
└───┴───────┴────────┴──────┴─────────┴────────────┴────────┘
[3 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which result from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__getitem__(target_indicies) SQLDataModel

Retrieves a subset of the SQLDataModel based on the specified indices.

Parameters:

slc – Indices specifying the rows and columns to be retrieved. This can be an integer, a tuple, a slice, or a combination of these.

Raises:
  • ValueError – if there are issues with the specified indices, such as invalid row or column names.

  • TypeError – if the slc type is not compatible with indexing SQLDataModel.

  • IndexError – if the slc includes a range or int that is outside of the current row count or column count.

Returns:

An instance of SQLDataModel containing the selected subset of data.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Retrieve a specific row by index
subset_model = sdm[3]

# Retrieve multiple rows and specific columns using a tuple
subset_model = sdm[(1, 2, 5), ["first_name", "age", "job"]]

# Retrieve a range of rows and all columns using a slice
subset_model = sdm[2:7]

# Retrieve a single column by name
subset_model = sdm["first_name"]
Change Log:
  • Version 0.5.0 (2024-05-09):
    • Modified index retention behavior to pass through row indicies and avoid resetting view order.

Note

  • The slc parameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.

  • The returned SQLDataModel instance will contain the specified subset of rows and columns, retaining the row indicies of the original view.

__gt__(other) SQLDataModel

Implements the greater than operator > for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

A new SQLDataModel containing the result rows of the operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter by 'service' column
sdm = sdm[sdm['service'] > 5.0]

# View result
print(sdm)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐
│   │ first │ last    │  age │ service │ hire_date  │ gender │
├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │ Female │
│ 1 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │ Male   │
└───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘
[2 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__iadd__(value: str | int | float | SQLDataModel) SQLDataModel

Implements the += operator functionality for compatible SQLDataModel operations.

Parameters:

value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.

Raises:

TypeError – If the provided value is not a valid type (str, int, float, or SQLDataModel).

Returns:

The modified SQLDataModel after the addition operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['idx', 'first', 'last', 'age', 'service']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Modifying first name column with a bang!
sdm['first'] += '!'

# View model
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john!  │ smith   │     27 │    1.22 │
│ 1 │ sarah! │ west    │     39 │    0.70 │
│ 2 │ mike!  │ harlin  │     36 │    3.00 │
│ 3 │ pat!   │ douglas │     42 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]
__idiv__(value: int | float | SQLDataModel) SQLDataModel

Implements the /= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • ZeroDivisionError – If value of divisor is 0.

Returns:

The modified SQLDataModel after the division operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Budget'])

# Adjust existing column
sdm['Budget'] /= 52
__ifloordiv__(value: int | float | SQLDataModel) SQLDataModel

Implements the //= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int or float).

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the floor division operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x']
data = [[10],[20],[30],[40],[50]]

# Create the model
sdm = SQLDataModel(data, headers)

# Modify the existing column
sdm['x'] //= 3

# View result
print(sdm)

This will output:

┌───┬──────┐
│   │    x │
├───┼──────┤
│ 0 │    3 │
│ 1 │    6 │
│ 2 │   10 │
│ 3 │   13 │
│ 4 │   16 │
└───┴──────┘
[5 rows x 1 columns]
__imul__(value: int | float | SQLDataModel) SQLDataModel

Implements the *= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.

Raises:

TypeError – If the provided value is not a valid type (int or float).

Returns:

The modified SQLDataModel after the multiplication operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Salary'])

# Give raises to all!
sdm['Salary'] *= 12
__init__(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = 'default')

Initializes a new instance of SQLDataModel.

Parameters:
  • data (list[list]) – The data to populate the model. Should be a list of lists or a list of tuples or a dictionary orientated by rows or columns.

  • headers (list[str]) – The column headers for the model. If not provided, default headers will be used.

  • dtypes (dict) – A dictionary specifying the data types for each column. Format: {‘column’: ‘dtype’}.

  • display_max_rows (int) – The maximum number of rows to display. If not provided, all rows will be displayed.

  • min_column_width (int) – The minimum width for each column. Default is 3.

  • max_column_width (int) – The maximum width for each column. Default is 38.

  • column_alignment (str) – The alignment for columns, must be ‘dynamic’, ‘left’, ‘center’ or ‘right’). Default is ‘dynamic’.

  • display_color (str|tuple|None) – The color for display as hex code string or rgb tuple.

  • display_index (bool) – Whether to display row indices. Default is True.

  • display_float_precision (int) – The number of decimal places to display for float values. Default is 2.

  • infer_types (bool) – Whether to infer the data types based on a randomly selected sample. Default is False, using first row to derive the corresponding type.

  • table_style (str) – The styling to use when representing the table, must be ‘ascii’, ‘bare’, ‘dash’, ‘default’, ‘double’, ‘list’, ‘markdown’, ‘outline’, ‘pandas’, ‘polars’, ‘postgresql’ or ‘round’. Default is ‘default’.

Raises:
  • ValueError – If data and headers are not provided, or if data is of insufficient length.

  • TypeError – If data or headers is not a valid type (list or tuple), or if dtypes is not a dictionary.

  • DimensionError – If the length of headers does not match the implied column count from the data.

  • SQLProgrammingError – If there’s an issue with executing SQL statements during initialization.

Example:

from SQLDataModel import SQLDataModel

# Create sample data
data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')]

# Create the model with custom headers
sdm = SQLDataModel(data, headers=['Name','Age','Sex'])

# Display the model
print(model)
┌────────┬──────┬──────┐
│ Name   │  Age │ Sex  │
├────────┼──────┼──────┤
│ Alice  │   20 │ F    │
│ Bob    │   25 │ M    │
│ Gerald │   30 │ M    │
└────────┴──────┴──────┘
[3 rows x 3 columns]

Note

  • If data is not provided, an empty model is created with headers, at least one of data, headers or dtypes are required to instantiate the model.

  • If headers are not provided, default headers will be generated using the the format '0', '1', ..., N where N is the column count.

  • If dtypes is provided, it must be a dictionary with column names as keys and Python data types as string values, e.g., {‘first_name’: ‘str’, ‘weight’: ‘float’}

  • If infer_types = True and dtypes are provided, the order will be resolved by first inferring the types, then overriding the inferred types for each {col:type} provided in the dtypes argument. If one is not provided, then the inferred type will be used as a fallback.

__ipow__(value: int | float | SQLDataModel) SQLDataModel

Implements the **= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to raise each element in the SQLDataModel to.

Raises:

TypeError – If the provided value is not a valid type (int or float).

Returns:

The modified SQLDataModel after the exponential operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Salary'])

# More raises!
sdm['Salary'] **= 2
__isub__(value: int | float | SQLDataModel) SQLDataModel

Implements the -= operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.

Raises:

TypeError – If the provided value is not a valid type (int, float, or SQLDataModel).

Returns:

The modified SQLDataModel after the subtraction operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age', 'service']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Modifying age column in the best direction
sdm['age'] -= 10

# View model
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john   │ smith   │     17 │    1.22 │
│ 1 │ sarah  │ west    │     29 │    0.70 │
│ 2 │ mike   │ harlin  │     26 │    3.00 │
│ 3 │ pat    │ douglas │     32 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]
__iter__() Iterator

Iterates over a range of rows in the SQLDataModel based on the current model’s row indices.

Raises:

StopIteration – When there are no more rows to return.

Yields:

tuple – A row fetched from the SQLDataModel.

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Iterate through rows
for row in sdm:
    print(row)

This will output:

(0, 'John', 30, 175.3)
(1, 'Alice', 28, 162.0)
(2, 'Travis', 35, 185.8)

Note

__le__(other) SQLDataModel

Implements the less than or equal to operator <= for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

A new SQLDataModel containing the result rows of the operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter by 'age' column
sdm = sdm[sdm['age'] <= 40]

# View result
print(sdm)

This will output:

┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐
│   │ first │ last   │  age │ service │ hire_date  │ gender │
├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith  │   27 │    1.22 │ 2023-02-01 │ Male   │
│ 1 │ Kelly │ Lee    │   32 │    8.00 │ 2016-09-18 │ Female │
│ 2 │ Mike  │ Harlin │   36 │    3.90 │ 2020-08-27 │ Male   │
└───┴───────┴────────┴──────┴─────────┴────────────┴────────┘
[3 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__len__() int

Returns the SQLDataModel.row_count property for the current SQLDataModel which represents the current number of rows in the model.

Returns:

The total number of rows in the SQLDataModel.

Return type:

int

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get current length
num_rows = len(sdm)

# View number
print(num_rows)

This will output:

1000
__lt__(other) SQLDataModel

Implements the less than operator < for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

A new SQLDataModel containing the result rows of the operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter by 'age' column
sdm = sdm[sdm['age'] < 40]

# View result
print(sdm)

This will output:

┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐
│   │ first │ last   │  age │ service │ hire_date  │ gender │
├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ John  │ Smith  │   27 │    1.22 │ 2023-02-01 │ Male   │
│ 1 │ Kelly │ Lee    │   32 │    8.00 │ 2016-09-18 │ Female │
│ 2 │ Mike  │ Harlin │   36 │    3.90 │ 2020-08-27 │ Male   │
└───┴───────┴────────┴──────┴─────────┴────────────┴────────┘
[3 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__mul__(value: int | float | SQLDataModel) SQLDataModel

Implements the * operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as multiplication) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the multiplication operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
sdm = SQLDataModel(data, headers)

# Perform scalar multiplication
sdm['x * 10'] = sdm['x'] * 10

# Perform vector multiplication using another column
sdm['x * y'] = sdm['x'] * sdm['y']

# View results
print(sdm)

This will output:

┌─────┬─────┬────────┬───────┐
│   x │   y │ x * 10 │ x * y │
├─────┼─────┼────────┼───────┤
│   2 │  10 │     20 │    20 │
│   4 │  20 │     40 │    80 │
│   8 │  30 │     80 │   240 │
│  16 │  40 │    160 │   640 │
│  32 │  50 │    320 │  1600 │
└─────┴─────┴────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing multipliers such as int * float will work, however an exception will be raised when attempting to perform multiplication on incompatible types such as str * float.

__ne__(other) SQLDataModel

Implements the not equal to operator != for comparing SQLDataModel against other and performing the equivalent set operation against the model’s current indicies.

Parameters:

other – The SQLDataModel or scalar (int, str, float) to compare with.

Returns:

A new SQLDataModel containing the result rows of the operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter by 'first' column
sdm = sdm[sdm['first'] != 'John']

# View result
print(sdm)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐
│   │ first │ last    │  age │ service │ hire_date  │ gender │
├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤
│ 0 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │ Female │
│ 1 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │ Male   │
│ 2 │ Sarah │ West    │   51 │    0.70 │ 2023-10-01 │ Female │
│ 3 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │ Male   │
└───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘
[4 rows x 6 columns]

Note

  • For scalar other (int, str, or float), compares each element with the scalar and returns the row indicies evaluating to True.

  • For SQLDataModel other, compares each element across X rows for Y columns for all (X_i, Y_j) in range of row_count and column_count and returns those row indicies evaluating to True.

  • All the equality operations return a python set object containing the row indicies which were returned from the evaluation.

  • All operations on standard types like int, float or str follow standard behavior and are not modified by performing the operations.

  • Operations can be chained using standard set operators like & and | to allow complex filtering, multiple operations require parenthesis.

__pow__(value: int | float | SQLDataModel) SQLDataModel

Implements the ** operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The exponent value to raise each element in the SQLDataModel to.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as exponentiation) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the exponential operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x', 'y']
data = [[2,1], [4,2], [8,3], [16,4], [32,5]]

# Create the model
sdm = SQLDataModel(data, headers)

# Perform scalar exponentiation
sdm['y ** 2'] = sdm['y'] ** 2

# Perform vector exponentiation using another column
sdm['x ** y'] = sdm['x'] ** sdm['y']

# View results
print(sdm)

This will output:

┌─────┬─────┬────────┬──────────┐
│   x │   y │ y ** 2 │   x ** y │
├─────┼─────┼────────┼──────────┤
│   2 │   1 │      1 │        2 │
│   4 │   2 │      4 │       16 │
│   8 │   3 │      9 │      512 │
│  16 │   4 │     16 │    65536 │
│  32 │   5 │     25 │ 33554432 │
└─────┴─────┴────────┴──────────┘
[5 rows x 4 columns]

Note

  • Mixing exponent types such as int ** float will work, however an exception will be raised when attempting to exponentiate incompatible types such as str ** float.

__repr__() str

Returns a pretty printed string representation of SQLDataModel formatted to the current terminal size.

Returns:

The string representation of the SQLDataModel instance output using display and format values set on instance.

Return type:

str

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the model
sdm = SQLDataModel(data,headers)

# Display the string representation
print(sdm)

This will output the default alignment, dynamically aligning columns based on their dtype, right-aligned for numeric, left otherwise:

┌───┬────────┬─────────┬────────┐
│   │ first  │ last    │    age │
├───┼────────┼─────────┼────────┤
│ 0 │ john   │ smith   │     27 │
│ 1 │ sarah  │ west    │     29 │
│ 2 │ mike   │ harlin  │     36 │
│ 3 │ pat    │ douglas │     42 │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Using 'left' column alignment:

# Using left alignment instead
sdm.set_column_alignment("left")

# See difference
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┐
│   │ first  │ last    │ age    │
├───┼────────┼─────────┼────────┤
│ 0 │ john   │ smith   │ 27     │
│ 1 │ sarah  │ west    │ 29     │
│ 2 │ mike   │ harlin  │ 36     │
│ 3 │ pat    │ douglas │ 42     │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Using 'center' column alignment:

# Using center alignment instead
sdm.set_column_alignment("center")

# See difference
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┐
│   │ first  │  last   │  age   │
├───┼────────┼─────────┼────────┤
│ 0 │  john  │  smith  │   27   │
│ 1 │ sarah  │  west   │   29   │
│ 2 │  mike  │ harlin  │   36   │
│ 3 │  pat   │ douglas │   42   │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Using 'right' column alignment:

# Using right alignment instead
sdm.set_column_alignment("right")

# See difference
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┐
│   │  first │    last │    age │
├───┼────────┼─────────┼────────┤
│ 0 │   john │   smith │     27 │
│ 1 │  sarah │    west │     29 │
│ 2 │   mike │  harlin │     36 │
│ 3 │    pat │ douglas │     42 │
└───┴────────┴─────────┴────────┘
[4 rows x 3 columns]

Note

__setitem__(target_indicies, update_values) None

Updates specified rows and columns in the SQLDataModel with the provided values.

Parameters:
  • target_indicies – Indices specifying the rows and columns to be updated. This can be an integer, a tuple, a slice, or a combination of these.

  • update_values – The values to be assigned to the corresponding model records. It can be of types: str, int, float, bool, bytes, list, tuple, or another SQLDataModel object.

Raises:
  • TypeError – If the update_values type is not compatible with SQL datatypes.

  • DimensionError – If there is a shape mismatch between targeted indicies and provided update values.

  • ValueError – If there are issues with the specified indices, such as invalid row or column names.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Job']
data = [
    ('Billy', 30, 'Barber'),
    ('Alice', 28, 'Doctor'),
    ('John', 25, 'Technician'),
    ('Travis', 35, 'Musician'),
    ('William', 15, 'Student')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Update a specific row with new values
sdm[2] = ("John", 25, "Engineer")

# See result
print(sdm)

This will output:

┌───┬─────────┬──────┬──────────┐
│   │ Name    │  Age │ Job      │
├───┼─────────┼──────┼──────────┤
│ 0 │ Billy   │   30 │ Barber   │
│ 1 │ Alice   │   28 │ Doctor   │
│ 2 │ John    │   25 │ Engineer │
│ 3 │ Travis  │   35 │ Musician │
│ 4 │ William │   15 │ Student  │
└───┴─────────┴──────┴──────────┘
[5 rows x 3 columns]
# Update multiple rows and columns with a list of values
sdm[1:5, ["Name", "Age", "Job"]] = [("Alice", 30, "Manager"), ("Bob", 28, "Developer"), ("Charlie", 35, "Designer"), ("David", 32, "Analyst")]

# See result
print(sdm)

This will output:

┌───┬─────────┬──────┬───────────┐
│   │ Name    │  Age │ Job       │
├───┼─────────┼──────┼───────────┤
│ 0 │ Billy   │   30 │ Barber    │
│ 1 │ Alice   │   30 │ Manager   │
│ 2 │ Bob     │   28 │ Developer │
│ 3 │ Charlie │   35 │ Designer  │
│ 4 │ David   │   32 │ Analyst   │
└───┴─────────┴──────┴───────────┘
[5 rows x 3 columns]
# Create a new column "Hobby" and set the values
sdm["Hobby"] = [('Fishing',), ('Biking',), ('Computers',), ('Photography',), ('Studying',)]

# See result
print(sdm)

This will output:

┌───┬─────────┬──────┬───────────┬─────────────┐
│   │ Name    │  Age │ Job       │ Hobby       │
├───┼─────────┼──────┼───────────┼─────────────┤
│ 0 │ Billy   │   30 │ Barber    │ Fishing     │
│ 1 │ Alice   │   30 │ Manager   │ Biking      │
│ 2 │ Bob     │   28 │ Developer │ Computers   │
│ 3 │ Charlie │   35 │ Designer  │ Photography │
│ 4 │ David   │   32 │ Analyst   │ Studying    │
└───┴─────────┴──────┴───────────┴─────────────┘
[5 rows x 4 columns]

Note

  • If update_values is another SQLDataModel object, its data will be normalized using the SQLDataModel.data() method.

  • The target_indicies parameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.

  • Values can be single values or iterables matching the specified rows and columns.

__sub__(value: int | float | SQLDataModel) SQLDataModel

Implements the - operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.

Raises:
  • TypeError – If the provided value is not a valid type (int or float).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as subtraction) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

Returns:

A new SQLDataModel resulting from the subtraction operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
sdm = SQLDataModel(data, headers)

# Perform scalar subtraction
sdm['x - 100'] = sdm['x'] - 100

# Perform vector subtraction using another column
sdm['x - y'] = sdm['x'] - sdm['y']

# View both results
print(sdm)

This will output:

┌─────┬─────┬─────────┬───────┐
│   x │   y │ x - 100 │ x - y │
├─────┼─────┼─────────┼───────┤
│   2 │  10 │     -98 │    -8 │
│   4 │  20 │     -96 │   -16 │
│   8 │  30 │     -92 │   -22 │
│  16 │  40 │     -84 │   -24 │
│  32 │  50 │     -68 │   -18 │
└─────┴─────┴─────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing subtractors such as int + float will work, however an exception will be raised when attempting to perform subtraction on incompatible types such as str - float.

__truediv__(value: int | float | SQLDataModel) SQLDataModel

Implements the / operator functionality for compatible SQLDataModel operations.

Parameters:

value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.

Raises:
  • TypeError – If the provided value is not a valid type (int, float or SQLDataModel).

  • DimensionError – Raised when the dimensions of the provided value are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape (4, 1) with values of shape (3, 2) will raise this exception.

  • ZeroDivisionError – If value is 0.

Returns:

A new SQLDataModel resulting from the division operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['x', 'y']
data = [[2,10], [4,20], [8,30], [16,40], [32,50]]

# Create the model
sdm = SQLDataModel(data, headers)

# Perform scalar division
sdm['y / 10'] = sdm['y'] / 10

# Perform vector division using another column
sdm['y / x'] = sdm['y'] / sdm['x']

# View both results
print(sdm)

This will output:

┌─────┬─────┬────────┬───────┐
│   x │   y │ y / 10 │ y / x │
├─────┼─────┼────────┼───────┤
│   2 │  10 │   1.00 │  5.00 │
│   4 │  20 │   2.00 │  5.00 │
│   8 │  30 │   3.00 │  3.75 │
│  16 │  40 │   4.00 │  2.50 │
│  32 │  50 │   5.00 │  1.56 │
└─────┴─────┴────────┴───────┘
[5 rows x 4 columns]

Note

  • Mixing divisor types such as int / float will work, however an exception will be raised when attempting to perform division on incompatible types such as str / float.

_generate_sql_stmt(columns: list[str] = None, rows: int | slice | tuple | str = None, index: bool = True, na_rep: str = None) str

Generate an SQL statement for fetching specific columns and rows from the model, duplicate column references are aliased in order of appearance.

Parameters:
  • columns (list of str, optional) – The list of columns to include in the SQL statement. If not provided, all columns from the model will be included.

  • rows (int, slice, tuple, optional) – The rows to include in the SQL statement. It can be an integer for a single row, a slice for a range of rows, or a tuple for specific row indices. If not provided, all rows will be included.

  • index (bool, optional) – If True, include the primary index column in the SQL statement.

  • na_rep (str, optional) – If provided, all null or empty string values are replaced with value.

Returns:

The generated SQL statement.

Return type:

str

Change Log:
  • Version 0.5.1 (2024-05-10):
    • Modified to allow rows argument to be provided directly as a string predicate to bypass numeric range-based selections.

  • Version 0.4.0 (2024-04-23):
    • Added nap_rep parameter to fill null or missing fields with provided value.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

_generate_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = None) tuple[tuple[str]]

Generates the character sets required for formatting SQLDataModel according to the value currently set at SQLDataModel.table_style.

Parameters:

style (Literal['ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql','round'], optional) – The table style to return. Default is value set on SQLDataModel.table_style.

Returns:

A 4-tuple containing the characters required for top, middle, row and lower table sections.

Return type:

tuple[tuple[str]]

Change Log:
  • Version 0.3.10 (2024-04-16):
    • Added style parameter to allow use by SQLDataModel.to_text() to generate new formatting styles introduced in version 0.3.9.

Note

  • This method is called by SQLDataModel.__repr__() to parse the characters necessary for constructing the tabular representation of the SQLDataModel, any modifications or changes to this method may result in unexpected behavior.

_get_display_args(include_dtypes: bool = False) dict

Retrieves the current display configuration settings of the SQLDataModel with the correct kwargs for the class SQLDataModel.__init__() method.

Parameters:

include_dtypes (bool, optional) – Whether SQLDataModel.dtypes should be included in the result. Default is False, including only display arguments.

Returns:

A dictionary containing the display configuration settings in the format {'setting': 'value'}.

Return type:

dict

Display Properties:
Dtype Property:
  • SQLDataModel.dtypes: A dictionary mapping the current model’s columns to their corresponding Python data type.

Change Log:
  • Version 0.6.2 (2024-05-15):
_get_sql_create_stmt() str

Retrieves the SQL create table statement used to create the current SQLDataModel.

Returns:

The SQL create table statement for the SQLDataModel.

Return type:

str

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the sample model
sdm = SQLDataModel(data,headers)

# Retrieve the create statement for the SQLDataModel
create_stmt = sdm._get_sql_create_stmt()

# Print the returned statement
print(create_stmt)

This will output:

CREATE TABLE "sdm" ("idx" INTEGER PRIMARY KEY,"first" TEXT,"last" TEXT,"age" INTEGER)
_update_indicies() None

Updates the SQLDataModel.indicies and SQLDataModel.row_count properties of the SQLDataModel instance representing the current valid row indicies and count.

Returns:

None

Note

  • This method is called internally any time the SQLDataModel.row_count property is subject to change, or data manipulation requires updating the current values.

  • There is no reason to call this method manually unless the model has been changed outside of the standard instance methods.

_update_indicies_deterministic(row_index: int) None

Quick implementation to update the SQLDataModel.indicies and SQLDataModel.row_count properties of the SQLDataModel instance representing the current valid row indicies and count based on the last inserted rowid.

Returns:

None

Change Log:
  • Version 0.6.0 (2024-05-14):
    • New method, improves performance for updating row indicies when update is deterministic.

Note

_update_model_metadata(update_row_meta: bool = False) None

Generates and updates metadata information about the columns and optionally the rows in the SQLDataModel instance based on the current model.

Attributes updated:
Parameters:

update_row_meta (bool, optional) – If True, updates row metadata information; otherwise, retrieves column metadata only (default).

Returns:

None

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age', 'service_time']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model with sample data
sdm = SQLDataModel(data, headers)

# View header master
print(sdm.header_master)

This will output:

{'first': ('TEXT', 'str', True, '<'),
 'last': ('TEXT', 'str', True, '<'),
 'age': ('INTEGER', 'int', True, '>'),
 'service_time': ('REAL', 'float', True, '>'),
 'idx': ('INTEGER', 'int', False, '>')}

Example Attributes Modified:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age', 'service_time']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 0.7),
    (2, 'mike', 'harlin', 3),
    (3, 'pat', 'douglas', 11.5)
]

# Create the model with sample data
sdm = SQLDataModel(data, headers)

# Get current column count
num_cols_before = sdm.column_count

# Add new column
sdm['new_column'] = 'empty'

# Method is called behind the scenes
sdm._update_model_metadata()

# Get new column count
num_cols_after = sdm.column_count

# View difference
print(f"cols before: {num_cols_before}, cols after: {num_cols_after}")

Note

  • This method is called after operations that may modify the current model’s structure and require synchronization.

_update_rows_and_columns_with_values(rows_to_update: tuple[int] = None, columns_to_update: list[str] = None, values_to_update: list[tuple] = None) None

Generates and executes a SQL update statement to modify specific rows and columns with provided values in the SQLDataModel.

Parameters:
  • rows_to_update – A tuple of row indices to be updated. If set to None, it defaults to all rows in the SQLDataModel.

  • columns_to_update – A list of column names to be updated. If set to None, it defaults to all columns in the SQLDataModel.

  • values_to_update – A list of tuples representing values to update in the specified rows and columns.

Raises:
  • TypeError – If the values_to_update parameter is not a list or tuple.

  • DimensionError – If the shape of the provided values does not match the specified rows and columns.

  • SQLProgrammingError – If the values_to_update parameter contains invalid or SQL incompatible data.

Example:

from SQLDataModel import SQLDataModel

# Update specific rows and columns with provided values
sdm._update_rows_and_columns_with_values(
    rows_to_update=(1, 2, 3),
    columns_to_update=["column1", "column2"],
    values_to_update=[(10, 'A'), (20, 'B'), (30, 'C')]
)

# Create a new column named "new_column" with default values
sdm._update_rows_and_columns_with_values(
    columns_to_update=["new_column"],
    values_to_update=[(None,)] * sdm.row_count
)

Note

  • Used by SQLDataModel.__setitem__() to broadcast updates across row and column index ranges.

  • To create a new column, pass a single header item in a list to the columns_to_update parameter.

  • To copy an existing column, pass the corresponding data is a list of tuples to the values_to_update parameter.

add_column_with_values(column_name: str, value=None) None

Adds a new column with the specified column_name to the SQLDataModel. The new column is populated with the values provided in the value argument. If value is not provided (default), the new column is populated with NULL values.

Parameters:
  • column_name (str) – The name of the new column to be added.

  • value – The value to populate the new column. If None (default), the column is populated with NULL values. If a valid column name is provided, the values of that column will be used to fill the new column.

Raises:
  • DimensionError – If the length of the provided values does not match the number of rows in the model.

  • TypeError – If the data type of the provided values is not supported or translatable to an SQL data type.

Example:

from SQLDataModel import SQLDataModel

# Create model from data
sdm = SQLDataModel.from_csv('data.csv')

# Add new column with default value 42
sdm.add_column_with_values('new_column', value=42)

# Add new column by copying values from an existing column
sdm.add_column_with_values('new_column', value='existing_column')

Note

  • Many other methods, including SQLDataModel.__setitem__() rely on this method, therefore modifying it may cause unpredictable behavior.

  • Determination for when to copy existing versus when to assign string is value is done by SQLDataModel.__eq__() against both values

static alias_duplicates(headers: list) Generator

Rename duplicate column names in a given list by appending an underscore and a numerical suffix.

Parameters:

headers (list) – A list of column names that require parsing for duplicates.

Yields:

Generator – A generator object that yields the original or modified column names.

Example:

from SQLDataModel import SQLDataModel

# Original list of column names with duplicates
original_headers = ['ID', 'Name', 'Amount', 'Name', 'Date', 'Amount']

# Use the static method to return a generator for the duplicates
renamed_generator = SQLDataModel.alias_duplicates(original_headers)

# Obtain the modified column names
modified_headers = list(renamed_generator)

# View modified column names
print(modified_headers)

# Output
modified_headers = ['ID', 'Name', 'Amount', 'Name_2', 'Date', 'Amount_2']

Example of implementation for SQLDataModel:

# Given a list of headers
original_headers = ['ID', 'ID', 'Name', 'Name', 'Name', 'Unique']

# Create a separate list for aliasing duplicates
aliased_headers = list(SQLDataModel.alias_duplicates(original_headers))

# View aliases
for col, alias in zip(original_headers, aliased_headers):
    print(f"{col} as {alias}")

This will output:

ID as ID
ID as ID_2
Name as Name
Name as Name_2
Name as Name_3
Unique as Unique
Change Log:
  • Version 0.3.4 (2024-04-05):
    • Modified to re-alias partially aliased input to prevent runaway incrementation on suffixes.

append_row(values: list | tuple = None) None

Appends values as a new row in the SQLDataModel at the next available index based on the current max row index from SQLDataModel.indicies. If values = None, an empty row with SQL null values will be used.

Parameters:

values (list or tuple, optional) – The values to be inserted into the row. If not provided or set to None, an empty row with SQL null values will be inserted.

Raises:
  • TypeError – If values is provided and is not of type list or tuple.

  • DimensionError – If the number of values provided does not match the current column count.

  • SQLProgrammingError – If there is an issue with the SQL execution during the insertion.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create a rowless model
sdm = SQLDataModel(headers=['Name', 'Age'])

# Append a row with values
sdm.append_row(['Alice', 31])

# Append another row
sdm.append_row(['John', 48])

# View result
print(sdm)

This will output:

┌───┬───────┬──────┐
│   │ Name  │ Age  │
├───┼───────┼──────┤
│ 0 │ Alice │ 31   │
│ 1 │ John  │ 48   │
└───┴───────┴──────┘
[2 rows x 2 columns]
Change Log:

Note

  • If no values are provided, None or SQL ‘null’ will be used for the values.

  • Rows will be appended to the bottom of the model at one index greater than the current max index.

apply(func: Callable) SQLDataModel

Applies func to the current SQLDataModel object and returns a modified SQLDataModel by passing its current values to the argument of func updated with the output.

Parameters:

func (Callable) – A callable function to apply to the SQLDataModel.

Raises:
  • TypeError – If the provided argument for func is not a valid callable.

  • SQLProgrammingError – If the provided function is not valid based on the current SQL datatypes.

Returns:

A modified SQLDataModel resulting from the application of func.

Return type:

SQLDataModel

Examples:

Applying to Single Column
from SQLDataModel import SQLDataModel

# Create the SQLDataModel:
sdm = SQLDataModel.from_csv('employees.csv', headers=['First Name', 'Last Name', 'City', 'State'])

# Create the function:
def uncase_name(x):
    return x.lower()

# Apply to existing column:
sdm['First Name'] = sdm['First Name'].apply(uncase_name) # existing column will be updated with new values

# Or create new one by passing in a new column name:
sdm['New Column'] = sdm['First Name'].apply(uncase_name) # new column will be created with returned values
Applying to Multiple Columns
from SQLDataModel import SQLDataModel

# Create the function, note that ``func`` must have the same number of args as the model ``.apply()`` is called on:
def summarize_employee(first, last, city, state)
    summary = f"{first} {last} is from {city}, {state}"

# Create a new 'Employee Summary' column for the returned values:
sdm['Employee Summary'] = sdm.apply(summarize_employee)
Applying a Built-in Function
import math
from SQLDataModel import SQLDataModel

# Create the SQLDataModel:
sdm = SQLDataModel.from_csv('number-data.csv', headers=['Number'])

# Apply the math.sqrt function to the original 'Number' column:
sdm_sqrt = sdm.apply(math.sqrt)
Applying a Lambda Function
from SQLDataModel import SQLDataModel

# Create the SQLDataModel:
sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2'])

# Create a new 'Column3' using the values returned from the lambda function:
sdm['Column3'] = sdm.apply(lambda x, y: x + y, new_column_name='Sum_Columns')

Note

  • The number of args in the inspected signature of func must equal the current number of SQLDataModel columns.

  • The number of func args must match the current number of columns in the model, or an Exception will be raised.

  • Use SQLDataModel.generate_apply_function_stub() method to return a preconfigured template using current SQLDataModel columns and dtypes to assist.

apply_function_to_column(func: Callable, column: str | int) None

Applies the specified callable function (func) to the provided SQLDataModel column. The function’s output is used to update the values in the column. For broader uses or more input flexibility, see related method apply().

Parameters:
  • func (Callable) – The callable function to apply to the column.

  • column (str | int) – The name or index of the column to which the function will be applied.

Raises:
  • TypeError – If the provided column argument is not a valid type (str or int).

  • IndexError – If the provided column index is outside the valid range of column indices.

  • ValueError – If the provided column name is not valid for the current model.

  • SQLProgrammingError – If the provided function return types or arg count is invalid or incompatible to SQL types.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('data.csv')

# Apply upper() method using lambda function to column ``name``
sdm.apply_function_to_column(lambda x: x.upper(), column='name')

# Apply addition through lambda function to column at index 1
sdm.apply_function_to_column(lambda x, y: x + y, column=1)

Note

  • This method is a simplified version of the SQLDataModel.apply() method, which can be used for arbitrary function params and inputs.

  • If providing a function name, ensure it can be used a valid sqlite3 identifier for the instance’s connection otherwise SQLProgrammingError will be raised.

astype(dtype: Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) SQLDataModel

Casts the model data into the specified python dtype.

Parameters:

dtype (Literal['bool', 'bytes', 'datetime', 'float', 'int', 'None', 'str']) – The target python data type to cast the values to.

Raises:

ValueError – If dtype is not one of ‘bool’, ‘bytes’, ‘datetime’, ‘float’, ‘int’, ‘None’, ‘str’.

Returns:

The data casted as the specified type as a new SQLDataModel.

Return type:

SQLDataModel

Warning

  • Type casting will coerce any nonconforming values to the dtype being set, this means data will be lost if casting values to incompatible types.

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height', 'Hired']
data = [
    ('John', 30, 175.3, 'True'),
    ('Alice', 28, 162.0, 'True'),
    ('Travis', 35, 185.8, 'False')
]

# Create the model
sdm = SQLDataModel(data, headers)

# See what we're working with
print(sdm)

This will output:

┌────────┬──────┬─────────┬───────┐
│ Name   │  Age │  Height │ Hired │
├────────┼──────┼─────────┼───────┤
│ John   │   30 │  175.30 │ True  │
│ Alice  │   28 │  162.00 │ True  │
│ Travis │   35 │  185.80 │ False │
└────────┴──────┴─────────┴───────┘
[3 rows x 4 columns]

We can return the values as new types or save them to a column:

# Convert the string based 'Hired' column to boolean values
sdm['Hired'] = sdm['Hired'].astype('bool')

# Let's also create a new 'Height' column, this time as an integer
sdm['Height int'] = sdm['Height'].astype('int')

# See the new values and their types
print(sdm)

This will output:

┌────────┬──────┬─────────┬───────┬────────────┐
│ Name   │  Age │  Height │ Hired │ Height int │
├────────┼──────┼─────────┼───────┼────────────┤
│ John   │   30 │  175.30 │ 1     │        175 │
│ Alice  │   28 │  162.00 │ 1     │        162 │
│ Travis │   35 │  185.80 │ 0     │        185 │
└────────┴──────┴─────────┴───────┴────────────┘
[3 rows x 5 columns]

Note

  • Unless the returned values are saved as a new column, using this method does not change the underlying column’s type currently assigned to it, to modify the column type use SQLDataModel.set_column_dtypes() instead.

  • Any None or null values encountered will not be coerced to the specified dtype, see SQLDataModel.fillna() for handling and filling null values appropriately.

column_alignment

The column alignment to use for string representations of the data, value must be one of ['dynamic','left','center','right'] Default is 'dynamic', using right-alignment for numeric columns and left-aligned for all others.

Type:

str

column_count

The current column count of the model.

Type:

int

concat(other: SQLDataModel | list | tuple, inplace: bool = True) None | SQLDataModel

Concatenates the provided data to SQLDataModel along the row axis, returning a new model or modifying the existing instance inplace.

Parameters:
  • other (SQLDataModel | list | tuple) – The SQLDataModel, list, or tuple to concatenate or append.

  • inplace (bool, optional) – If True (default), performs the concatenation in-place, modifying the current model. If False, returns a new SQLDataModel instance with the concatenated result.

Returns:

None when inplace = True and SQLDataModel when in_place = False

Return type:

None or SQLDataModel

Raises:
  • TypeError – If the other argument is not one of type SQLDataModel, list, or tuple.

  • ValueError – If other is a list or tuple with insufficient data where the column dimension is < 1.

  • DimensionError – If the column count of the current model does not match the column count of the other model or tuple.

Example:

from SQLDataModel import SQLDataModel

# Datasets a and b
data_a = (['A', 1], ['B', 2])
data_b = (['C', 3], ['D', 4])

# Create the models
sdm_a = SQLDataModel(data_a, headers=['letter', 'number'])
sdm_b = SQLDataModel(data_b, headers=['letter', 'number'])

# Concatenate the two models
sdm_ab = sdm_a.concat(sdm_b, inplace=False)

# View result
print(sdm_ab)

This will output:

┌────────┬────────┐
│ letter │ number │
├────────┼────────┤
│ A      │ 1      │
│ B      │ 2      │
│ C      │ 3      │
│ D      │ 4      │
└────────┴────────┘
[4 rows x 2 columns]
# List or tuples can also be used directly
data_e = ['E', 5]

# Append in place
sdm_ab.concat(data_e)

# View result
print(sdm_ab)

This will output:

┌───┬────────┬────────┐
│   │ letter │ number │
├───┼────────┼────────┤
│ 0 │ A      │      1 │
│ 1 │ B      │      2 │
│ 2 │ C      │      3 │
│ 3 │ D      │      4 │
│ 4 │ E      │      5 │
└───┴────────┴────────┘
[5 rows x 2 columns]

Note

  • Models must be of compatible dimensions with equal column_count or equivalent dimension if list or tuple

  • Headers are inherited from the model calling the SQLDataModel.concat() method whether done inplace or being returned as new instance.

copy(data_only: bool = False) SQLDataModel

Returns a deep copy of the current model as a new SQLDataModel.

Parameters:

data_only (bool) – If True, only the data is copied, otherwise display and styling parameters are included. Default is False.

Returns:

A cloned copy from the original as a new SQLDataModel.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the original model with list styling
sdm = SQLDataModel(data, headers, table_style='list')

# Create two copies, one full and one with data only
copy_full = sdm.copy()
copy_data = sdm.copy(data_only=True)

# View both copies
print(copy_full)
print(copy_data)

This will output both copies, with copy_full including any styling parameters such as table_style='list':

   Name    Age   Height
-  ------  ---  -------
0  John     30   175.30
1  Alice    28   162.00
2  Travis   35   185.80

With the output for copy_data containing only the original model’s data:

┌───┬────────┬─────┬─────────┐
│   │ Name   │ Age │  Height │
├───┼────────┼─────┼─────────┤
│ 0 │ John   │  30 │  175.30 │
│ 1 │ Alice  │  28 │  162.00 │
│ 2 │ Travis │  35 │  185.80 │
└───┴────────┴─────┴─────────┘

Note

count() SQLDataModel

Returns a new SQLDataModel containing the counts of non-null values for each column in a row-wise orientation.

Returns:

A new SQLDataModel containing the counts of non-null values in each column.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data with missing values
headers = ['Name', 'Age', 'Gender', 'Tenure']
data = [
    ('Alice', 25, 'Female', 1.0),
    ('Bob', None, 'Male', 2.7),
    ('Charlie', 30, 'Male', None),
    ('David', None, 'Male', 3.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get counts
counts = sdm.count()

# View result
print(counts)

This will output the count of all non-null values for each column:

┌──────┬─────┬────────┬────────┐
│ Name │ Age │ Gender │ Tenure │
├──────┼─────┼────────┼────────┤
│    4 │   2 │      4 │      3 │
└──────┴─────┴────────┴────────┘
[1 rows x 4 columns]

Note

count_unique() SQLDataModel

Returns a new SQLDataModel containing the total counts and unique values for each column in the model for both null and non-null values.

Metrics:
  • 'column' contains the names of the columns counted.

  • 'na' contains the total number of null values in the column.

  • 'unique' contains the total number of unique values in the column.

  • 'count' contains the total number of non-null values in the column.

  • 'total' contains the total number of all null and non-null values in the column.

Returns:

A new SQLDataModel containing columns ‘column’, ‘unique’, and ‘count’ representing the column name, total unique values, and total values count, respectively.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Gender']
data = [
    ('Alice', 25, 'Female'),
    ('Bob', 30, None),
    ('Alice', 25, 'Female')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get the value count information
count_model = sdm.count_unique()

# View the count information
print(count_model)

This will output:

┌────────┬──────┬────────┬───────┬───────┐
│ column │   na │ unique │ count │ total │
├────────┼──────┼────────┼───────┼───────┤
│ Name   │    0 │      2 │     3 │     3 │
│ Age    │    0 │      2 │     3 │     3 │
│ Gender │    1 │      1 │     2 │     3 │
└────────┴──────┴────────┴───────┴───────┘
[3 rows x 5 columns]
Change Log:
  • Version 0.3.2 (2024-04-02):
    • Renamed method from counts to count_unique for more precise definition.

Note

  • See SQLDataModel.count() for the count of non-null values for each column in a row-wise orientation.

data(index: bool = False, include_headers: bool = False, strict_2d: bool = False) list[tuple]

Returns the SQLDataModel data as a list of tuples for multiple rows, a single tuple for individual rows, as a single item for individual cells. Data is returned without index and headers by default, use include_headers=True or index=True to modify.

Parameters:
  • index (bool, optional) – If True, includes the index in the result; if False, excludes the index. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result; if False, excludes headers. Default is False.

  • strict_2d (bool, optional) – If True, returns data as a 2-dimensional list of tuples regardless of data dimension. Default is False.

Returns:

The data currently stored in the model as a list of tuples.

Return type:

list[tuple]

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers, display_float_precision=2)

# View full table
print(sdm)

This will output:

┌────────┬──────┬─────────┐
│ Name   │  Age │  Height │
├────────┼──────┼─────────┤
│ John   │   30 │  175.30 │
│ Alice  │   28 │  162.00 │
│ Travis │   35 │  185.80 │
└────────┴──────┴─────────┘
[3 rows x 3 columns]

Get data for specific row:

# Grab data from single row
row_data = sdm[0].data()

# View it
print(row_data)

This will output the row as a tuple of values:

('John', 30, 175.3)

Get data for specific column:

# Grab data from single column
col_data = sdm['Name'].data()

# View it
print(col_data)

This will output the column values as a list of tuples:

[('John',), ('Alice',), ('Travis',)]
Change Log:
  • Version 0.5.0 (2024-05-09):
    • Added strict_2d parameter to allow predictable return type regardless of data dimension.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • Many other SQLDataModel methods rely on this method, changing it will lead to undefined behavior.

  • See related SQLDataModel.from_data() for creating a new SQLDataModel from existing data sources.

  • Use strict_2d = True to always return data as a list of tuples regardless of data dimension.

deduplicate(subset: list[str] = None, reset_index: bool = True, keep_first: bool = True, inplace: bool = True) None | SQLDataModel

Removes duplicate rows from the SQLDataModel based on the specified subset of columns. Deduplication occurs inplace by default, otherwise use inplace=False to return a new SQLDataModel.

Parameters:
  • subset (list[str], optional) – List of columns to consider when identifying duplicates. If None, all columns are considered. Defaults to None.

  • reset_index (bool, optional) – If True, resets the index after deduplication starting at 0; otherwise retains current indicies.

  • keep_first (bool, optional) – If True, keeps the first occurrence of each duplicated row; otherwise, keeps the last occurrence. Defaults to True.

  • inplace (bool, optional) – If True, modifies the current SQLDataModel in-place; otherwise, returns a new SQLDataModel without duplicates. Defaults to True.

Raises:

ValueError – If a column specified in subset is not found in the SQLDataModel.

Returns:

If inplace = True the method modifies the current SQLDataModel in-place return None, otherwise if inplace = False a new SQLDataModel is returned.

Return type:

None or SQLDataModel

Examples:

Based on Single Column
from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Deduplicate based on a specific column
sdm.deduplicate(subset='ID', keep_first=True, inplace=True)
Based on Multiple Columns
from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Deduplicate based on multiple columns and save to keep both models
sdm_deduped = sdm.deduplicate(subset=['ID', 'Name'], keep_first=False, inplace=False)

Note

  • Ordering for keep_first is determined by the current SQLDataModel.sql_idx order of the instance.

  • For multiple columns ordering is done sequentially favoring first index in subset, then i+1, …, to i+len(subset)

describe(exclude_columns: str | list = None, exclude_dtypes: list[Literal['str', 'int', 'float', 'date', 'datetime', 'bool']] = None, ignore_na: bool = True, **kwargs) SQLDataModel

Generates descriptive statistics for columns in the SQLDataModel instance based on column dtype including count, unique values, top value, frequency, mean, standard deviation, minimum, 25th, 50th, 75th percentiles, maximum and dtype for specified column.

Parameters:
  • exclude_columns (str | list, optional) – Columns to exclude from the analysis. Default is None.

  • exclude_dtypes (list[Literal["str", "int", "float", "date", "datetime", "bool"]], optional) – Data types to exclude from the analysis. Default is None.

  • ignore_na (bool, optional) – If True, ignores NA like values (‘NA’, ‘ ‘, ‘None’) when computing statistics. Default is True.

  • **kwargs – Additional keyword arguments to be passed to the execute_fetch method.

Statistics Described:
  • count: Total number of non-null values for specified column

  • unique: Total number of unique values for specified column

  • top: Top value represented for specified column, ties broken arbitrarily

  • freq: Frequency of corresponding value represented in ‘top’ metric

  • mean: Mean as calculated by summing all values and dividing by ‘count’

  • std: Standard Deviation for specified column

    • Uncorrected sample standard deviation for int, float dtypes

    • Mean time difference represented in number of days for date, datetime dtypes

    • ‘NaN’ for all other dtypes

  • min: Minimum value for specified column

    • Least value for int, float dtypes

    • Least value sorted by alphabetical ascending for str dtypes

    • Earliest date or datetime for date, datetime dtypes

  • p25: Percentile, 25th

    • Max first bin value as determined by quartered binning of values for int, float dtypes

    • ‘NaN’ for all other dtypes

  • p50: Percentile, 50th

    • Max second bin value as determined by quartered binning of values for int, float dtypes

    • ‘NaN’ for all other dtypes

  • p75: Percentile, 75th

    • Max third bin value as determined by quartered binning of values for int, float dtypes

    • ‘NaN’ for all other dtypes

  • max: Maximum value for specified column

    • Greatest value for int, float dtypes

    • Greatest value sorted by alphabetical ascending for str dtypes

    • Latest date or datetime for date, datetime dtypes

  • dtype: Datatype of specified column

    • Python datatype as determined by relevant class __name__ attribute, e.g. ‘float’ or ‘int’

    • dtypes can be excluded by using exclude_dtypes parameter

Returns:

A new SQLDataModel containing a comprehensive set of descriptive statistics for selected columns.

Return type:

SQLDataModel

Note

  • Standard deviation is calculated using uncorrected sample standard deviation for numeric dtypes, and timediff in days for datetime dtypes

  • Ties in unique, top and freq columns are broken arbitrarily as determined by first ordering of values prior to calling describe()

  • Ties encountered when binning for p25, p50, p75 will favor lower bins for data that cannot be quartered cleanly

  • Metrics for count, min, p25, p50, p75 and max include non-null values only

  • Using ignore_na=True only affects inclusion of ‘NA like’ values such as empty strings

  • Floating point precision determined by SQLDataModel.display_float_precision attribute

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('employees.csv')

# View all 10 rows
print(sdm)

This will output:

┌───┬──────────────────┬────────────┬─────────────┬───────────────┬────────┬─────────────────────┐
│   │ name             │ hire_date  │ country     │ service_years │    age │ last_update         │
├───┼──────────────────┼────────────┼─────────────┼───────────────┼────────┼─────────────────────┤
│ 0 │ Pamela Berg      │ 2007-06-06 │ New Zealand │          3.02 │     56 │ 2023-08-12 17:13:46 │
│ 1 │ Mason Hoover     │ 2009-04-19 │ Australia   │          5.01 │     41 │ 2023-05-18 01:29:44 │
│ 2 │ Veda Suarez      │ 2007-07-02 │ Ukraine     │          4.65 │     26 │ 2023-12-09 15:38:01 │
│ 3 │ John Smith       │ 2017-08-12 │ New Zealand │          3.81 │     35 │ 2023-03-10 18:23:56 │
│ 4 │ Xavier McCoy     │ 2021-04-03 │ France      │          2.95 │     42 │ 2023-09-27 11:39:08 │
│ 5 │ John Smith       │ 2020-10-11 │ Germany     │          4.61 │     56 │ 2023-12-09 18:41:52 │
│ 6 │ Abigail Mays     │ 2021-07-25 │ Costa Rica  │          5.34 │     50 │ 2023-02-11 16:43:07 │
│ 7 │ Rama Galloway    │ 2009-02-09 │ Italy       │          3.87 │     24 │ 2023-03-13 16:08:48 │
│ 8 │ Lucas Rodriquez  │ 2018-06-19 │ New Zealand │          2.73 │     28 │ 2023-03-17 01:45:22 │
│ 9 │ Hunter Donaldson │ 2015-12-18 │ Belgium     │          4.58 │     43 │ 2023-04-06 03:22:54 │
└───┴──────────────────┴────────────┴─────────────┴───────────────┴────────┴─────────────────────┘
[10 rows x 6 columns]

Now that we have our SQLDataModel, we can generate some statistics:

# Generate statistics
sdm_described = sdm.describe()

# View stats
print(sdm_described)

This will output:

┌────────┬──────────────┬─────────────┬─────────────┬───────────────┬────────┬─────────────────────┐
│ metric │         name │   hire_date │     country │ service_years │    age │         last_update │
├────────┼──────────────┼─────────────┼─────────────┼───────────────┼────────┼─────────────────────┤
│ count  │           10 │          10 │          10 │            10 │     10 │                  10 │
│ unique │            9 │          10 │           8 │            10 │      9 │                  10 │
│ top    │   John Smith │  2021-07-25 │ New Zealand │          5.34 │     56 │ 2023-12-09 18:41:52 │
│ freq   │            2 │           1 │           3 │             1 │      2 │                   1 │
│ mean   │          NaN │  2014-11-24 │         NaN │          4.06 │     40 │ 2023-06-16 19:18:39 │
│ std    │          NaN │ 2164.4 days │         NaN │          0.92 │     11 │         117.58 days │
│ min    │ Abigail Mays │  2007-06-06 │   Australia │          2.73 │     24 │ 2023-02-11 16:43:07 │
│ p25    │          NaN │  2009-02-09 │         NaN │          3.02 │     28 │ 2023-03-13 16:08:48 │
│ p50    │          NaN │  2017-08-12 │         NaN │          4.58 │     42 │ 2023-05-18 01:29:44 │
│ p75    │          NaN │  2020-10-11 │         NaN │          4.65 │     50 │ 2023-09-27 11:39:08 │
│ max    │ Xavier McCoy │  2021-07-25 │     Ukraine │          5.34 │     56 │ 2023-12-09 18:41:52 │
│ dtype  │          str │        date │         str │         float │    int │            datetime │
└────────┴──────────────┴─────────────┴─────────────┴───────────────┴────────┴─────────────────────┘
[12 rows x 7 columns]

Specific columns or data types can be excluded from result:

# Set filters to exclude all str dtypes and the 'hire_date' column:
sdm_describe = sdm.describe(exclude_dtypes=['str'], exclude_columns=['hire_date'])

# View statistics
print(sdm_described)

This will output:

┌────────┬───────────────┬────────┬─────────────────────┐
│ metric │ service_years │    age │         last_update │
├────────┼───────────────┼────────┼─────────────────────┤
│ count  │            10 │     10 │                  10 │
│ unique │            10 │      9 │                  10 │
│ top    │          5.34 │     56 │ 2023-10-28 05:42:43 │
│ freq   │             1 │      2 │                   1 │
│ mean   │          4.06 │     40 │ 2023-08-11 23:18:12 │
│ std    │          0.92 │     11 │          73.15 days │
│ min    │          2.73 │     24 │ 2023-04-07 23:56:06 │
│ p25    │          3.02 │     28 │ 2023-06-02 14:36:19 │
│ p50    │          4.58 │     42 │ 2023-09-09 19:18:38 │
│ p75    │          4.65 │     50 │ 2023-10-09 19:34:55 │
│ max    │          5.34 │     56 │ 2023-10-28 05:42:43 │
│ dtype  │         float │    int │            datetime │
└────────┴───────────────┴────────┴─────────────────────┘
[12 rows x 4 columns]
Change Log:
  • Version 0.6.3 (2024-05-16):
    • Modified model to output values as string data types and set columns to right-aligned if arguments are not present in kwargs to retain metric resolution while having numeric alignment.

Important

  • Generally, do not rely on SQLDataModel to do statistics, use NumPy or a real scientific computing library instead.

Note

  • Use SQLDataModel.infer_dtypes() to cast columns to their apparent data type, or set it manually with SQLDataModel.set_column_dtypes() to convert columns to different data types.

  • Statistics for date and datetime can be unpredictable if formatting used is inconsistent with conversion to Julian days or if column data type is incorrect.

display_color

The display color to use for string representations of the model. Default is None, using the standard terminal color.

Type:

ANSIColor

display_float_precision

The floating point precision to use for string representations of the table, does not affect the actual floating point values stored in the model. Default is 2.

Type:

int

display_index

Determines whether the index column is displayed when string representations of the table are generated. Default is True.

Type:

bool

display_max_rows

The maximum number of rows to display. Default is 1,000 rows.

Type:

int

drop_column(column: int | str | list, inplace: bool = True) None | SQLDataModel

Drops the specified column(s) from the SQLDataModel. Values for column can be a single column name or index, or a list of multiple column names or indicies to drop from the model.

Parameters:
  • column (int | str | list) – The index, name, or list of indices/names of the column(s) to drop.

  • inplace (bool) – If True, drops the column(s) in-place and updates the model metadata. If False, returns a new SQLDataModel object without the dropped column(s) and does not modify the original object. Default is True.

Returns:

If inplace is True, returns None. Otherwise, returns a new SQLDataModel object without the dropped column(s).

Return type:

None | SQLDataModel

Raises:
  • TypeError – If the column parameter is not of type ‘int’, ‘str’, or a list containing equivalent types.

  • IndexError – If any provided column index is outside the current column range.

  • ValueError – If any provided column name is not found in the model’s headers.

Examples:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Gender', 'City']
data = [
    ('Alice', 30, 'Female', 'Milwaukee'),
    ('Sarah', 35, 'Female', 'Houston'),
    ('Mike', 28, 'Male', 'Atlanta'),
    ('John', 25, 'Male', 'Boston'),
    ('Bob', 22, 'Male', 'Chicago'),
]

# Create the model
sdm = SQLDataModel(data,headers)

# Drop the 'Gender' column
sdm.drop_column('Gender')

# View updated model
print(sdm)

This will output:

┌───────┬──────┬───────────┐
│ Name  │  Age │ City      │
├───────┼──────┼───────────┤
│ Alice │   30 │ Milwaukee │
│ Sarah │   35 │ Houston   │
│ Mike  │   28 │ Atlanta   │
│ John  │   25 │ Boston    │
│ Bob   │   22 │ Chicago   │
└───────┴──────┴───────────┘
[5 rows x 3 columns]

Dropping multiple columns:

# Drop first and last columns by index
sdm.drop_column([0,-1])

# View updated model
print(sdm)

This will output:

┌──────┬────────┐
│  Age │ Gender │
├──────┼────────┤
│   30 │ Female │
│   35 │ Female │
│   28 │ Male   │
│   25 │ Male   │
│   22 │ Male   │
└──────┴────────┘
[5 rows x 2 columns]

Drop columns and return as a new SQLDataModel:

# Drop the multiple columns and return as a new model
sdm = sdm.drop_column(['Age','Gender'], inplace=False)

# View updated model
print(sdm)

This will output:

┌───────┬───────────┐
│ Name  │ City      │
├───────┼───────────┤
│ Alice │ Milwaukee │
│ Sarah │ Houston   │
│ Mike  │ Atlanta   │
│ John  │ Boston    │
│ Bob   │ Chicago   │
└───────┴───────────┘
[5 rows x 2 columns]

Note

  • Arguments for column can be a single str or int or list[str] containing str or list[int] containing int representing column names or column indicies, respectively, but they cannot be combined and provided together. For example, passing columns = ['First Name', 3] will raise a TypeError exception.

  • The equivalent of this method can also be achieved by simply indexing the required rows and columns using sdm[rows, column] notation, see SQLDataModel.__getitem__() for additional details.

dtypes

The current model data types mapped to each column in the format of {'col': 'dtype'} where 'dtype' is a string representing the corresponding python type.

Type:

dict[str, str]

execute_fetch(sql_query: str, sql_params: tuple = None, **kwargs) SQLDataModel

Returns a new SQLDataModel object, including display and style properties, after executing the provided SQL query using the current SQLDataModel. This method is called by other methods which expect results to be returned from their execution.

Parameters:
  • sql_query (str) – The SQL query to execute with the expectation of rows returned.

  • **kwargs (optional) – Additional keyword args to pass to SQLDataModel constructor

Raises:
  • SQLProgrammingError – If the provided SQL query is invalid or malformed.

  • ValueError – If the provided SQL query was valid but returned 0 rows, which is insufficient to return a new model.

Returns:

A new SQLDataModel instance containing the result of the SQL query.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2'])

# Create the SQL query to execute
query = 'SELECT * FROM sdm WHERE Column1 > 10'

# Fetch and save the result to a new instance
result_model = sdm.execute_fetch(query)

# Create a parameterized SQL query to execute
query = 'SELECT * FROM sdm WHERE Column1 > ? OR Column2 < ?'
params = (10, 20)

# Provide the SQL and the statement parameters
result_parameterized = sdm.execute_fetch(query, params)

Important

Change Log:
  • Version 0.6.2 (2024-05-15):

Note

  • Use SQLDataModel.set_model_name() to modify the table name used by the model, default name set as 'sdm'.

  • Display properties such as float precision, index column or table styling are also passed to the new instance when not provided in kwargs.

execute_statement(sql_stmt: str) None

Executes an arbitrary SQL query against the current model without the expectation of selection or returned rows.

Parameters:

sql_stmt (str) – The SQL query to execute.

Raises:

SQLProgrammingError – If the SQL execution fails.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('data.csv')

# Execute statement without results, modifying column in place
sdm.execute_statement('UPDATE table SET column = value WHERE condition')

Note

execute_transaction(sql_script: str) None

Executes a prepared SQL script wrapped in a transaction against the current model without the expectation of selection or returned rows.

Parameters:

sql_script (str) – The SQL script to execute within a transaction.

Raises:

SQLProgrammingError – If the SQL execution fails.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('data.csv')

# Script to update columns with predicate
transaction_script = '''
    UPDATE table1 SET column1 = value1 WHERE condition1;
    UPDATE table2 SET column2 = value2 WHERE condition2;
'''

# Execute the script
sdm.execute_transaction(transaction_script)

Note

  • If no errors were raised, the transaction was successfully executed as provided.

  • Parameters cannot be passed to as a transaction script in sqlite3, use SQLDataModel.execute_fetch() method if parameter bindings are required.

  • Many other methods heavily rely on the SQLDataModel.execute_transaction() method, therefore modifying it may adversly affect many other methods.

fillna(value, strictly_null: bool = False, inplace: bool = True) None | SQLDataModel

Fills missing (na or nan) values in the current SQLDataModel with the provided value inplace or as a new instance.

Parameters:
  • value – The scalar value to fill missing values with. Should be of type ‘str’, ‘int’, ‘float’, ‘bytes’, or ‘bool’.

  • inplace (bool) – If True, modifies the current instance in-place. If False, returns a new instance with missing values filled.

  • strictly_null (bool) – If True, only strictly null values are filled. If False, values like 'NA', 'NaN', 'n/a', 'na', and whitespace only strings are also filled.

Raises:

TypeError – If value is not a scalar type or is incompatible with SQLite’s type system.

Returns:

When inplace=True modifies model inplace, returning None, when inplace=False a new SQLDataModel is returned.

Return type:

None or SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create sample data
data = [('Alice', 25, None), ('Bob', None, 'N/A'), ('Charlie', 'NaN', ' '), ('David', 30, 'NA')]

# Create the model
sdm = SQLDataModel(data, headers=['Name', 'Age', 'Status'])

# Fill missing values with 0
sdm_filled = sdm.fillna(value=0, strictly_null=False, inplace=False)

# View filled model
print(sdm_filled)

This will output:

┌───┬─────────┬──────┬────────┐
│   │ Name    │  Age │ Status │
├───┼─────────┼──────┼────────┤
│ 0 │ Alice   │   25 │      0 │
│ 1 │ Bob     │    0 │      0 │
│ 2 │ Charlie │    0 │      0 │
│ 3 │ David   │   30 │      0 │
└───┴─────────┴──────┴────────┘
[4 rows x 3 columns]

Note

  • The method supports filling missing values with various scalar types which are then adapted to the columns set dtype.

  • The strictly_null parameter controls whether additional values like ('NA', 'NAN', 'n/a', 'na', '') with last being an empty string, are treated as null.

static flatten_json(json_source: list | dict, flatten_rows: bool = True, level_sep: str = '_', key_prefix: str = None) dict

Parses raw JSON data and flattens it into a dictionary with optional normalization.

Parameters:
  • json_source (dict | list) – The raw JSON data to be parsed.

  • flatten_rows (bool) – If True, the data will be normalized into columns and rows. If False,

  • key_prefix. (columns will be concatenated from each row using the specified) –

  • level_sep (str) – Separates nested levels from other levels and used to concatenate prefix to column.

  • key_prefix (str) – The prefix to prepend to the JSON keys. If None, an empty string is used.

Returns:

A flattened dictionary representing the parsed JSON data.

Return type:

dict

Example:

from SQLDataModel import SQLDataModel

# Sample JSON
json_source = [{
    "alpha": "A",
    "value": 1
},
{
    "alpha": "B",
    "value": 2
},
{
    "alpha": "C",
    "value": 3
}]

# Flatten JSON with normalization
flattened_data = flatten_json(json_data, flatten_rows=True)

# Format of result
flattened_data = {"alpha": ['A','B','C'], "value": [1, 2, 3]}

# Alternatively, flatten columns without rows and adding a prefix
flattened_data = flatten_json(raw_input,key_prefix='row_',flatten_rows=False)

# Format of result
flattened_data = {'row_0_alpha': 'A', 'row_0_value': 1, 'row_1_alpha': 'B', 'row_1_value': 2, 'row_2_alpha': 'C', 'row_2_value': 3}
freeze_index(column_name: str = None) None

Freeze the current index as a new column, expanding it into the current model. The new column is unaffected by any future changes to the primary index column.

Parameters:

column_name (str, optional) – The name for the new frozen index column. If not provided, a default name ‘frzn_id’ will be used.

Raises:

TypeError – If the provided column_name is not of type ‘str’.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Freeze index as new column 'id'
sdm.freeze_index("id")

# View model
print(sdm)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┬──────┐
│   │ first │ last    │  age │ service │ hire_date  │   id │
├───┼───────┼─────────┼──────┼─────────┼────────────┼──────┤
│ 0 │ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │    0 │
│ 1 │ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │    1 │
│ 2 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │    2 │
│ 3 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │    3 │
│ 4 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │    4 │
└───┴───────┴─────────┴──────┴─────────┴────────────┴──────┘
[5 rows x 6 columns]

Note

classmethod from_csv(csv_source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiter: str = ',', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel

Returns a new SQLDataModel generated from the provided CSV source, which can be either a file path or a raw delimited string.

Parameters:
  • csv_source (str) – The path to the CSV file or a raw delimited string.

  • infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.

  • encoding (str, optional) – The encoding used to decode the CSV source if it is a file. Default is ‘Latin1’.

  • delimiter (str, optional) – The delimiter to use when parsing CSV source. Default is ,.

  • quotechar (str, optional) – The character used for quoting fields. Default is ".

  • headers (List[str], optional) – List of column headers. If None, the first row of the CSV source is assumed to contain headers.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided CSV source.

Return type:

SQLDataModel

Raises:
  • ValueError – If no delimited data is found in csv_source or if parsing with delimiter does not yield valid tabular data.

  • Exception – If an error occurs while attempting to read from or process the provided CSV source.

Examples:

From CSV File
from SQLDataModel import SQLDataModel

# CSV file path or raw CSV string
csv_source = "/path/to/data.csv"

# Create the model using the CSV file, providing custom headers
sdm = SQLDataModel.from_csv(csv_source, headers=['ID', 'Name', 'Value'])
From CSV Literal
from SQLDataModel import SQLDataModel

# CSV data
data = '''
A, B, C
1a, 1b, 1c
2a, 2b, 2c
3a, 3b, 3c
'''

# Create the model
sdm = SQLDataModel.from_csv(data)

# View result
print(sdm)

This will output:

┌──────┬──────┬──────┐
│ A    │ B    │ C    │
├──────┼──────┼──────┤
│ 1a   │ 1b   │ 1c   │
│ 2a   │ 2b   │ 2c   │
│ 3a   │ 3b   │ 3c   │
└──────┴──────┴──────┘
[3 rows x 3 columns]
Change Log:
  • Version 0.4.0 (2024-04-23):
    • Modifed to only parse CSV files and removed all delimiter sniffing with introduction of new method SQLDataModel.from_delimited() to handle other delimiters.

    • Renamed delimiters parameter to delimiter with , set as new default to reflect revised focus on CSV files only.

Note

  • If csv_source is delimited by characters other than those specified, use SQLDataModel.from_delimited() and provide delimiter to delimiters.

  • If headers are provided, the first row parsed from source will be the first row in the table and not discarded.

  • The infer_types argument can be used to infer the appropriate data type for each column:
    • If infer_types = True, a random subset of the data will be used to infer the correct type and cast values accordingly

    • If infer_types = False, values from the first row only will be used to assign types, almost always ‘str’ when reading from CSV.

classmethod from_data(data: Any = None, **kwargs) SQLDataModel

Convenience method to infer the source of data and return the appropriate constructor method to generate a new SQLDataModel instance.

Parameters:
  • data (Any, required) – The input data from which to create the SQLDataModel object.

  • **kwargs – Additional keyword arguments to be passed to the constructor method, see init method for arguments.

Constructor methods are called according to the input type:
Returns:

The SQLDataModel object created from the provided data.

Return type:

SQLDataModel

Raises:
  • TypeError – If the type of data is not supported.

  • ValueError – If the file extension is not found, unsupported, or if the SQL extension is not supported.

  • Exception – If an OS related error occurs during file read operations if data is a filepath.

Example:

from SQLDataModel import SQLDataModel

# Create SQLDataModel from a CSV file
sdm_csv = SQLDataModel.from_data("data.csv", headers=['ID', 'Name', 'Value'])

# Create SQLDataModel from a dictionary
sdm_dict = SQLDataModel.from_data({"ID": int, "Name": str, "Value": float})

# Create SQLDataModel from a list of tuples
sdm_list = SQLDataModel.from_data([(1, 'Alice', 100.0), (2, 'Bob', 200.0)], headers=['ID', 'Name', 'Value'])

# Create SQLDataModel from raw string literal
delimited_literal = '''
A, B, C
1, 2, 3
4, 5, 6
7, 8, 9
'''

# Create the model by having correct constructor inferred
sdm = SQLDataModel.from_data(delimited_literal)

# View output
print(sdm)

This will output:

┌────┬────┬────┐
│ A  │ B  │ C  │
├────┼────┼────┤
│ 1  │ 2  │ 3  │
│ 4  │ 5  │ 6  │
│ 7  │ 8  │ 9  │
└────┴────┴────┘
[3 rows x 3 columns]

Note

  • This method attempts to infer the correct method to call based on data argument, if one cannot be inferred an exception is raised.

  • For data type specific implementation or examples, see related method for appropriate data type.

classmethod from_delimited(source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiters: str = ', \t;|:', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel

Returns a new SQLDataModel generated from the provided delimited source, which can be either a file path or a raw delimited string.

Parameters:
  • source (str) – The path to the delimited file or a raw delimited string.

  • infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.

  • encoding (str, optional) – The encoding used to decode the source if it is a file. Default is 'Latin1'.

  • delimiters (str, optional) – Possible delimiters. Default is \s, \t, ;, |, : or , (space, tab, semicolon, pipe, colon or comma).

  • quotechar (str, optional) – The character used for quoting fields. Default is ".

  • headers (list[str], optional) – List of column headers. If None, the first row of the delimited source is assumed to be the header row.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided CSV source.

Return type:

SQLDataModel

Raises:
  • ValueError – If no delimiter is found in source or if parsing with delimiter does not yield valid tabular data.

  • Exception – If an error occurs while attempting to read from or process the provided CSV source.

Example:

From Delimited Literal
from SQLDataModel import SQLDataModel

# Space delimited literal
source_data = '''
Name Age Height
Beth 27 172.4
Kate 28 162.0
John 30 175.3
Will 35 185.8'''

# Create the model
sdm = SQLDataModel.from_delimited(source_data)

# View output
print(sdm)

This will output:

┌──────┬─────┬─────────┐
│ Name │ Age │  Height │
├──────┼─────┼─────────┤
│ Beth │  27 │  172.40 │
│ Kate │  28 │  162.00 │
│ John │  30 │  175.30 │
│ Will │  35 │  185.80 │
└──────┴─────┴─────────┘
[4 rows x 3 columns]
From Delimited File
from SQLDataModel import SQLDataModel

# Tab separated file
tsv_file = 'persons.tsv'

# Create the model
sdm = SQLDataModel.from_delimited(tsv_file)

Note

  • Use SQLDataModel.from_csv() if delimiter in source is already known and available as this method requires more compute to determine a plausible delimiter.

  • Use SQLDataModel.from_text() if data is not delimited but is a string representation such as an ASCII table or the output from another SQLDataModel instance.

  • If file is delimited by delimiters other than the default targets \s, \t, ;, |, : or , (space, tab, semicolon, pipe, colon or comma) make sure they are provided as single character values to delimiters.

classmethod from_dict(data: dict | list, **kwargs) SQLDataModel

Create a new SQLDataModel instance from the provided dictionary.

Parameters:
  • data (dict) – The dictionary or list of dictionaries to convert to SQLDataModel. If keys are of type int, they will be used as row indexes; otherwise, keys will be used as headers.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided dictionary.

Return type:

SQLDataModel

Raises:
  • TypeError – If the provided dictionary values are not of type ‘list’, ‘tuple’, or ‘dict’.

  • ValueError – If the provided data appears to be a list of dicts but is empty.

Example:

from SQLDataModel import SQLDataModel

# Sample data with column orientation
data = {
    'Name': ['Beth', 'John', 'Alice', 'Travis'],
    'Height': [172.4, 175.3, 162.0, 185.8],
    'Age': [27, 30, 28, 35]
}

# Create the model
sdm = SQLDataModel.from_dict(data)

# View it
print(sdm)

This will output:

┌────────┬─────────┬─────┐
│ Name   │  Height │ Age │
├────────┼─────────┼─────┤
│ Beth   │  172.40 │  27 │
│ John   │  175.30 │  30 │
│ Alice  │  162.00 │  28 │
│ Travis │  185.80 │  35 │
└────────┴─────────┴─────┘
[4 rows x 3 columns]

We can also create a model using a dictionary with row orientation:

from SQLDataModel import SQLDataModel

# Sample data with row orientation
data = {
     0: ['Mercury', 0.38]
    ,1: ['Venus', 0.91]
    ,2: ['Earth', 1.00]
    ,3: ['Mars', 0.38]
}

# Create the model with custom headers
sdm = SQLDataModel.from_dict(data, headers=['Planet', 'Gravity'])

# View output
print(sdm)

This will output the model created using row-wise dictionary data:

┌─────────┬─────────┐
│ Planet  │ Gravity │
├─────────┼─────────┤
│ Mercury │    0.38 │
│ Venus   │    0.91 │
│ Earth   │    1.00 │
│ Mars    │    0.38 │
└─────────┴─────────┘
[4 rows x 2 columns]
Change Log:
  • Version 0.6.3 (2024-05-16):
    • Modified to try parsing input data as JSON if initial inspection does not signify row or column orientation.

Note

  • If data orientation suggests JSON like structure, then SQLDataModel.from_json() will attempt to construct the model.

  • Dictionaries in list like orientation can also be used with structures similar to JSON objects.

  • The method determines the structure of the SQLDataModel based on the format of the provided dictionary.

  • If the keys are integers, they are used as row indexes; otherwise, keys are used as headers.

  • See SQLDataModel.to_dict() for converting existing instances of SQLDataModel to dictionaries.

classmethod from_excel(filename: str, worksheet: int | str = 0, min_row: int | None = None, max_row: int | None = None, min_col: int | None = None, max_col: int | None = None, headers: list[str] = None, **kwargs) SQLDataModel

Returns a new SQLDataModel instance from the specified Excel file.

Parameters:
  • filename (str) – The file path to the Excel file, e.g., filename = 'titanic.xlsx'.

  • worksheet (int | str, optional) – The index or name of the worksheet to read from. Defaults to 0, indicating the first worksheet.

  • min_row (int | None, optional) – The minimum row number to start reading data from. Defaults to None, indicating the first row.

  • max_row (int | None, optional) – Maximum row index (1-based) to import. Defaults to None, indicating all rows are read.

  • min_col (int | None, optional) – Minimum column index (1-based) to import. Defaults to None, indicating the first column.

  • max_col (int | None, optional) – Maximum column index (1-based) to import. Defaults to None, indicating all the columns are read.

  • headers (List[str], optional) – The column headers for the data. Default is None, using the first row of the Excel sheet as headers.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Raises:
  • ModuleNotFoundError – If the required package openpyxl is not installed as determined by _has_xl flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid Excel file path.

  • Exception – If an error occurs during Excel read and write operations related to openpyxl processing.

Returns:

A new instance of SQLDataModel created from the Excel file.

Return type:

SQLDataModel

Examples:

We’ll use this Excel file, data.xlsx, as the source for the below examples:

    ┌───────┬─────┬────────┬───────────┐
    │ A     │ B   │ C      │ D         │
┌───┼───────┼─────┼────────┼───────────┤
│ 1 │ Name  │ Age │ Gender │ City      │
│ 2 │ John  │ 25  │ Male   │ Boston    │
│ 3 │ Alice │ 30  │ Female │ Milwaukee │
│ 4 │ Bob   │ 22  │ Male   │ Chicago   │
│ 5 │ Sarah │ 35  │ Female │ Houston   │
│ 6 │ Mike  │ 28  │ Male   │ Atlanta   │
└───┴───────┴─────┴────────┴───────────┘
[ Sheet1 ]

Example 1: Load Excel file with default parameters

from SQLDataModel import SQLDataModel

# Create the model using the default parameters
sdm = SQLDataModel.from_excel('data.xlsx')

# View imported data
print(sdm)

This will output all of the data starting from ‘A1’:

┌───────┬──────┬────────┬───────────┐
│ Name  │  Age │ Gender │ City      │
├───────┼──────┼────────┼───────────┤
│ John  │   25 │ Male   │ Boston    │
│ Alice │   30 │ Female │ Milwaukee │
│ Bob   │   22 │ Male   │ Chicago   │
│ Sarah │   35 │ Female │ Houston   │
│ Mike  │   28 │ Male   │ Atlanta   │
└───────┴──────┴────────┴───────────┘
[5 rows x 4 columns]

Example 2: Load Excel file from specific worksheet

from SQLDataModel import SQLDataModel

# Create the model from 'Sheet2'
sdm = SQLDataModel.from_excel('data.xlsx', worksheet='Sheet2')

# View imported data
print(sdm)

This will output the contents of ‘Sheet2’:

┌────────┬───────┐
│ Gender │ count │
├────────┼───────┤
│ Male   │     3 │
│ Female │     2 │
└────────┴───────┘
[2 rows x 2 columns]

Example 3: Load Excel file with custom headers starting from different row

from SQLDataModel import SQLDataModel

# Use our own headers instead of the Excel ones
new_cols = ['Col A', 'Col B', 'Col C', 'Col D']

# Create the model starting from the 2nd row to ignore the original headers
sdm = SQLDataModel.from_excel('data.xlsx', min_row=2, headers=new_cols)

# View the data
print(sdm)

This will output the data with our renamed headers:

┌───────┬───────┬────────┬───────────┐
│ Col A │ Col B │ Col C  │ Col D     │
├───────┼───────┼────────┼───────────┤
│ John  │    25 │ Male   │ Boston    │
│ Alice │    30 │ Female │ Milwaukee │
│ Bob   │    22 │ Male   │ Chicago   │
│ Sarah │    35 │ Female │ Houston   │
│ Mike  │    28 │ Male   │ Atlanta   │
└───────┴───────┴────────┴───────────┘
[5 rows x 4 columns]

Example 4: Load Excel file with specific subset of columns

from SQLDataModel import SQLDataModel

# Create the model using the middle two columns only
sdm = SQLDataModel.from_excel('data.xlsx', min_col=2, max_col=3)

# View the data
print(sdm)

This will output only the middle two columns:

┌──────┬────────┐
│  Age │ Gender │
├──────┼────────┤
│   25 │ Male   │
│   30 │ Female │
│   22 │ Male   │
│   35 │ Female │
│   28 │ Male   │
└──────┴────────┘
[5 rows x 2 columns]

Note

  • This method entirely relies on openpyxl, see their amazing documentation for further information on Excel file handling in python.

  • If custom headers are provided using the default min_row, then the original headers, if present, will be duplicated.

  • All indicies for min_row, max_row, min_col and max_col are 1-based instead of 0-based, again see openpyxl for more details.

  • See related SQLDataModel.to_excel() for exporting an existing SQLDataModel to Excel.

classmethod from_html(html_source: str, encoding: str = 'utf-8', table_identifier: int | str = 0, infer_types: bool = True, **kwargs) SQLDataModel

Parses HTML table element from one of three possible sources: web page at url, local file at path, raw HTML string literal. If table_identifier is not specified, the first <table> element successfully parsed is returned, otherwise if table_identifier is a str, the parser will return the corresponding ‘id’ or ‘name’ HTML attribute that matches the identifier specified. If table_identifier is an int, the parser will return the table matched as a sequential index after parsing all <table> elements from the top of the page down, starting at ‘0’. By default, the first <table> element found is returned if table_identifier is not specified.

Parameters:
  • html_source (str) – The HTML source, which can be a URL, a valid path to an HTML file, or a raw HTML string. If starts with ‘http’, the argument is considered a url and the table will be parsed from returned the web request. If is a valid file path, the argument is considered a local file and the table will be parsed from its html. If is not a valid url or path, the argument is considered a raw HTML string and the table will be parsed directly from the input.

  • encoding (str) – The encoding to use for reading HTML when html_source is considered a valid url or file path (default is ‘utf-8’).

  • table_identifier (int | str) – An identifier to specify which table to parse if there are multiple tables in the HTML source (default is 0).

  • infer_types (bool, optional) – If column data types should be inferred in the return model. Default is True, meaning column types will be inferred otherwise are returned as ‘str’ types. If is int, identifier is treated as the indexed location of the <table> element on the page from top to bottom starting from zero and will return the corresponding position when encountered. If is str, identifier is treated as a target HTML ‘id’ or ‘name’ attribute to search for and will return the first case-insensitive match when encountered.

  • **kwargs – Additional keyword arguments to pass when using urllib.request.urlopen to fetch HTML from a URL.

Returns:

A new SQLDataModel instance containing the data from the parsed HTML table.

Return type:

SQLDataModel

Raises:
  • TypeError – If html_source is not of type str representing a possible url, filepath or raw HTML stream.

  • HTTPError – Raised from urllib when html_source is considered a url and an HTTP exception occurs.

  • URLError – Raised from urllib when html_source is considered a url and a URL exception occurs.

  • ValueError – If no <table> elements are found or if the targeted table_identifier is not found.

  • OSError – Related exceptions that may be raised when html_source is considered a file path.

Examples:

From Website URL
from SQLDataModel import SQLDataModel

# From URL
url = 'https://en.wikipedia.org/wiki/1998_FIFA_World_Cup'

# Lets get the 94th table from the 1998 World Cup
sdm = SQLDataModel.from_html(url, table_identifier=94)

# View result:
print(sdm)

This will output:

┌────┬─────────────┬────┬────┬────┬────┬────┬────┬────┬─────┬──────┐
│  R │ Team        │ G  │  P │  W │  D │  L │ GF │ GA │ GD  │ Pts. │
├────┼─────────────┼────┼────┼────┼────┼────┼────┼────┼─────┼──────┤
│  1 │ France      │ C  │  7 │  6 │  1 │  0 │ 15 │  2 │ +13 │   19 │
│  2 │ Brazil      │ A  │  7 │  4 │  1 │  2 │ 14 │ 10 │ +4  │   13 │
│  3 │ Croatia     │ H  │  7 │  5 │  0 │  2 │ 11 │  5 │ +6  │   15 │
│  4 │ Netherlands │ E  │  7 │  3 │  3 │  1 │ 13 │  7 │ +6  │   12 │
│  5 │ Italy       │ B  │  5 │  3 │  2 │  0 │  8 │  3 │ +5  │   11 │
│  6 │ Argentina   │ H  │  5 │  3 │  1 │  1 │ 10 │  4 │ +6  │   10 │
│  7 │ Germany     │ F  │  5 │  3 │  1 │  1 │  8 │  6 │ +2  │   10 │
│  8 │ Denmark     │ C  │  5 │  2 │  1 │  2 │  9 │  7 │ +2  │    7 │
└────┴─────────────┴────┴────┴────┴────┴────┴────┴────┴─────┴──────┘
[8 rows x 11 columns]
From Local File
from SQLDataModel import SQLDataModel

# From HTML file
sdm = SQLDataModel.from_html('path/to/file.html')

# View output
print(sdm)

This will output:

┌─────────────┬────────┬──────┐
│ Team        │ Points │ Rank │
├─────────────┼────────┼──────┤
│ Brazil      │ 63.7   │ 1    │
│ England     │ 50.7   │ 2    │
│ Spain       │ 50.0   │ 3    │
│ Germany [a] │ 49.3   │ 4    │
│ Mexico      │ 47.3   │ 5    │
│ France      │ 46.0   │ 6    │
│ Italy       │ 44.3   │ 7    │
│ Argentina   │ 44.0   │ 8    │
└─────────────┴────────┴──────┘
[8 rows x 3 columns]
From Raw HTML
from SQLDataModel import SQLDataModel

# Raw HTML
raw_html =
'''<table id="find-me">
    <tr>
        <th>Col 1</th>
        <th>Col 2</th>
    </tr>
    <tr>
        <td>A</td>
        <td>1</td>
    </tr>
    <tr>
        <td>B</td>
        <td>2</td>
    </tr>
    <tr>
        <td>C</td>
        <td>3</td>
    </tr>
</table>'''

# Create the model and search for id attribute
sdm = SQLDataModel.from_html(raw_html, table_identifier="find-me")

# View output
print(sdm)

This will output:

┌───┬───────┬───────┐
│   │ Col 1 │ Col 2 │
├───┼───────┼───────┤
│ 1 │ B     │ 2     │
│ 2 │ C     │ 3     │
└───┴───────┴───────┘
[3 rows x 2 columns]

Note

  • **kwargs passed to method are used in urllib.request.urlopen if html_source is being considered as a web url.

  • **kwargs passed to method are used in open if html_source is being considered as a filepath.

  • The largest row size encountered will be used as the column_count for the returned SQLDataModel, rows will be padded with None if less.

  • See SQLDataModel.generate_html_table_chunks() for initial source chunking before content fed to SQLDataModel.HTMLParser.

classmethod from_json(json_source: str | list | dict, encoding: str = 'utf-8', **kwargs) SQLDataModel

Creates a new SQLDataModel instance from JSON file path or JSON-like source, flattening if required.

Parameters:
  • json_source (str | list | dict) – The JSON source. If a string, it can represent a file path or a JSON-like object.

  • encoding (str) – The encoding to use when reading from a file. Defaults to ‘utf-8’.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Returns:

A new SQLDataModel instance created from the JSON source.

Return type:

SQLDataModel

Raises:
  • TypeError – If the json_source argument is not of type ‘str’, ‘list’, or ‘dict’.

  • OSError – If related exception occurs when trying to open and read from json_source as file path.

Examples:

From JSON String Literal
from SQLDataModel import SQLDataModel

# Sample JSON string
json_data = '''[{
    "id": 1,
    "color": "red",
    "value": "#f00"
    },
    {
    "id": 2,
    "color": "green",
    "value": "#0f0"
    },
    {
    "id": 3,
    "color": "blue",
    "value": "#00f"
}]'''

# Create the model
sdm = SQLDataModel.from_json(json_data)

# View result
print(sdm)

This will output:

┌──────┬───────┬───────┐
│   id │ color │ value │
├──────┼───────┼───────┤
│    1 │ red   │ #f00  │
│    2 │ green │ #0f0  │
│    3 │ blue  │ #00f  │
└──────┴───────┴───────┘
[3 rows x 3 columns]
From JSON-like Object
from SQLDataModel import SQLDataModel

# JSON-like sample
json_data = [{
    "alpha": "A",
    "value": "1"
},
{
    "alpha": "B",
    "value": "2"
},
{
    "alpha": "C",
    "value": "3"
}]

# Create the model
sdm = SQLDataModel.from_json(json_data)

# Output
print(sdm)

This will output:

┌───────┬───────┐
│ alpha │ value │
├───────┼───────┤
│ A     │ 1     │
│ B     │ 2     │
│ C     │ 3     │
└───────┴───────┘
[3 rows x 2 columns]
From JSON file
from SQLDataModel import SQLDataModel

# JSON file path
json_data = 'data/json-sample.json'

# Create the model
sdm = SQLDataModel.from_json(json_data, encoding='latin-1')

# View output
print(sdm)

This will output:

┌──────┬────────┬───────┬─────────┐
│   id │ color  │ value │ notes   │
├──────┼────────┼───────┼─────────┤
│    1 │ red    │ #f00  │ primary │
│    2 │ green  │ #0f0  │         │
│    3 │ blue   │ #00f  │ primary │
│    4 │ cyan   │ #0ff  │         │
│    5 │ yellow │ #ff0  │         │
│    5 │ black  │ #000  │         │
└──────┴────────┴───────┴─────────┘
[6 rows x 4 columns]

Note

  • If json_source is deeply-nested it will be flattened according to the staticmethod SQLDataModel.flatten_json()

  • If json_source is a JSON-like string object that is not an array, it will be wrapped according as an array.

classmethod from_latex(latex_source: str, table_identifier: int = 1, encoding: str = 'utf-8', **kwargs) SQLDataModel

Creates a new SQLDataModel instance from the provided LaTeX file or raw literal.

Parameters:
  • latex_source (str) – The LaTeX source containing one or more LaTeX tables. If latex_source is a valid system filepath, source will be treated as a .tex file and parsed. If latex_source is not a valid filepath, source will be parsed as raw LaTeX literal.

  • table_identifier (int, optional) – The index position of the LaTeX table to extract. Default is 1.

  • encoding (str, optional) – The file encoding to use if source is a LaTex filepath. Default is ‘utf-8’;.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel instance created from the parsed LaTeX table.

Return type:

SQLDataModel

Raises:
  • TypeError – If the latex_source argument is not of type ‘str’, or if the table_identifier argument is not of type ‘int’.

  • ValueError – If the table_identifier argument is less than 1, or if no tables are found in the LaTeX source.

  • IndexError – If the table_identifier is greater than the number of tables found in the LaTeX source.

Table Indicies:
  • In the last example, sdm will contain the data from the second table found in the LaTeX content.

  • Tables are indexed starting from index 1 at the top of the LaTeX content, incremented as they are found.

  • LaTeX parsing stops after the table specified at table_identifier is found without parsing the remaining content.

Examples:

From LaTeX literal
from SQLDataModel import SQLDataModel

# Raw LaTeX literal
latex_content = '''
\begin{tabular}{|l|r|r|}
\hline
    {Name} & {Age} & {Height} \\
\hline
    John    &   30 &  175.30 \\
    Alice   &   28 &  162.00 \\
    Michael &   35 &  185.80 \\
\hline
\end{tabular}
'''

# Create the model from the LaTeX
sdm = SQLDataModel.from_latex(latex_content)

# View result
print(sdm)

This will output:

┌─────────┬──────┬─────────┐
│ Name    │  Age │  Height │
├─────────┼──────┼─────────┤
│ John    │   30 │  175.30 │
│ Alice   │   28 │  162.00 │
│ Michael │   35 │  185.80 │
└─────────┴──────┴─────────┘
[3 rows x 3 columns]
From LaTeX file
from SQLDataModel import SQLDataModel

# Load LaTeX content from file
latex_file = 'path/to/latex/file.tex'

# Create the model using the path
sdm = SQLDataModel.from_latex(latex_file)
Specifying table identifier
from SQLDataModel import SQLDataModel

# Raw LaTeX literal with multiple tables
latex_content = '''
%% LaTeX with a Table

\begin{tabular}{|l|l|}
\hline
    {Header A} & {Header B} \\
\hline
    Value A1 & Value B1 \\
    Value A2 & Value B2 \\
\hline
\end{tabular}

%% Then another Table

\begin{tabular}{|l|l|}
\hline
    {Header X} & {Header Y} \\
\hline
    Value X1 & Value Y1 \\
    Value X2 & Value Y2 \\
\hline
\end{tabular}
'''

# Create the model from the 2nd table
sdm = SQLDataModel.from_latex(latex_content, table_identifier=2)

# View output
print(sdm)

This will output:

┌──────────┬──────────┐
│ Header X │ Header Y │
├──────────┼──────────┤
│ Value X1 │ Value Y1 │
│ Value X2 │ Value Y2 │
└──────────┴──────────┘
[2 rows x 2 columns]

Note

  • LaTeX tables are identified based on the presence of tabular environments: \begin{tabular}...\end{tabular}.

  • The table_identifier specifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.

  • The provided kwargs are passed to the SQLDataModel constructor for additional parameters to the instance returned.

classmethod from_markdown(markdown_source: str, table_identifier: int = 1, **kwargs) SQLDataModel

Creates a new SQLDataModel instance from the provided Markdown source file or raw content.

If markdown_source is a valid system path, the markdown file will be parsed. Otherwise, the provided string will be parsed as raw markdown.

Parameters:
  • markdown_source (str) – The Markdown source file path or raw content.

  • table_identifier (int, optional) – The index position of the markdown table to extract. Default is 1.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Raises:
  • TypeError – If the markdown_source argument is not of type ‘str’, or if the table_identifier argument is not of type ‘int’.

  • ValueError – If the table_identifier argument is less than 1, or if no tables are found in the markdown source.

  • IndexError – If the table_identifier is greater than the number of tables found in the markdown source.

Returns:

The SQLDataModel instance created from the parsed markdown table.

Return type:

SQLDataModel

Table indicies:
  • In the last example, sdm will contain the data from the second table found in the markdown content.

  • Tables are indexed starting from index 1 at the top of the markdown content, incremented as they are found.

  • Markdown parsing stops after the table specified at table_identifier is found without parsing the remaining content.

Examples:

From Markdown Literal
from SQLDataModel import SQLDataModel

# Raw markdown literal
markdown_content = '''
| Item          | Price | # In stock |
|---------------|-------|------------|
| Juicy Apples  | 1.99  | 37         |
| Bananas       | 1.29  | 52         |
| Pineapple     | 3.15  | 14         |
'''

# Create the model from the markdown
sdm = SQLDataModel.from_markdown(markdown_content)

# View result
print(sdm)

This will output:

┌──────────────┬───────┬────────────┐
│ Item         │ Price │ # In stock │
├──────────────┼───────┼────────────┤
│ Juicy Apples │ 1.99  │ 37         │
│ Bananas      │ 1.29  │ 52         │
│ Pineapple    │ 3.15  │ 14         │
└──────────────┴───────┴────────────┘
[3 rows x 3 columns]
From Markdown File
from SQLDataModel import SQLDataModel

# Load markdown content from file
markdown_file_path = 'path/to/markdown_file.md'

# Create the model using the path
sdm = SQLDataModel.from_markdown(markdown_file_path)
Specifying Table Identifier
from SQLDataModel import SQLDataModel

# Raw markdown literal with multiple tables
markdown_content = '''
### Markdown with a Table

| Header A | Header B |
|----------|----------|
| Value A1 | Value B1 |
| Value A2 | Value B2 |

### Then another Table

| Header X | Header Y |
|----------|----------|
| Value X1 | Value Y1 |
| Value X2 | Value Y2 |

'''
# Create the model from the 2nd table
sdm = SQLDataModel.from_markdown(markdown_content, table_identifier=2)

# View output
print(sdm)

This will output:

┌──────────┬──────────┐
│ Header X │ Header Y │
├──────────┼──────────┤
│ Value X1 │ Value Y1 │
│ Value X2 │ Value Y2 │
└──────────┴──────────┘
[2 rows x 2 columns]

Note

  • Markdown tables are identified based on the presence of pipe characters | defining table cells.

  • The table_identifier specifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.

  • Escaped pipe characters \| within the markdown are replaced with the HTML entity reference &vert; for proper parsing.

  • The provided kwargs are passed to the SQLDataModel constructor for additional parameters to the instance returned.

classmethod from_numpy(array, headers: list[str] = None, **kwargs) SQLDataModel

Returns a SQLDataModel object created from the provided numpy array.

Parameters:
  • array (numpy.ndarray) – The numpy array to convert to a SQLDataModel.

  • headers (list of str, optional) – The list of headers to use for the SQLDataModel. If None, no headers will be used, and the data will be treated as an n-dimensional array. Default is None.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the numpy array.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package numpy is not found.

  • TypeError – If array argument is not of type numpy.ndarray.

  • DimensionError – If array.ndim != 2 representing a (row, column) tabular array.

Example:

import numpy as np
from SQLDataModel import SQLDataModel

# Sample array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Create the model with custom headers
sdm = SQLDataModel.from_numpy(arr, headers=['Col A', 'Col B', 'Col C])

# View output
print(sdm)

This will output:

┌───────┬───────┬───────┐
│ Col A │ Col B │ Col C │
├───────┼───────┼───────┤
│     1 │     2 │     3 │
│     4 │     5 │     6 │
│     7 │     8 │     9 │
└───────┴───────┴───────┘
[3 rows x 3 columns]

Note

  • Numpy array must have ‘2’ dimensions, the first representing the rows, and the second the columns.

  • If no headers are provided, default headers will be generated as ‘col_N’ where N represents the column integer index.

classmethod from_pandas(df, headers: list[str] = None, **kwargs) SQLDataModel

Returns a SQLDataModel object created from the provided df representing a Pandas DataFrame object. Note that pandas must be installed in order to use this method.

Parameters:
  • df (pandas.DataFrame) – The pandas DataFrame to convert to a SQLDataModel.

  • headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the df object.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the pandas DataFrame.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package pandas is not found.

  • TypeError – If df argument is not of type pandas.DataFrame.

Example:

import pandas as pd
from SQLDataModel import SQLDataModel

# Create a pandas DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})

# Create the model
sdm = SQLDataModel.from_pandas(df)

Note

  • If headers are not provided, the existing pandas columns will be used as the new SQLDataModel headers.

classmethod from_parquet(filename: str, **kwargs) SQLDataModel

Returns a new SQLDataModel instance from the specified parquet file.

Parameters:
  • filename (str) – The file path to the parquet file, e.g., filename = 'user/data/titanic.parquet'.

  • **kwargs – Additional keyword arguments to pass to the pyarrow read_table function, e.g., filters = [('Name','=','Alice')].

Returns:

A new instance of SQLDataModel created from the parquet file.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed as determined by _has_pa flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid parquet file path.

  • FileNotFoundError – If the specified parquet filename is not found.

  • Exception – If any unexpected exception occurs during the file or parquet reading process.

Example:

from SQLDataModel import SQLDataModel

# Sample parquet file
pq_file = "titanic.parquet"

# Create the model
sdm = SQLDataModel.from_parquet(pq_file)

# View column counts
print(sdm.count())

This will output:

┌────┬─────────────┬──────┬────────┬───────┬───────┐
│    │ column      │   na │ unique │ count │ total │
├────┼─────────────┼──────┼────────┼───────┼───────┤
│  0 │ PassengerId │    0 │    891 │   891 │   891 │
│  1 │ Survived    │    0 │      2 │   891 │   891 │
│  2 │ Pclass      │    0 │      3 │   891 │   891 │
│  3 │ Name        │    0 │    891 │   891 │   891 │
│  4 │ Sex         │    0 │      2 │   891 │   891 │
│  5 │ Age         │  177 │     88 │   714 │   891 │
│  6 │ SibSp       │    0 │      7 │   891 │   891 │
│  7 │ Parch       │    0 │      7 │   891 │   891 │
│  8 │ Ticket      │    0 │    681 │   891 │   891 │
│  9 │ Fare        │    0 │    248 │   891 │   891 │
│ 10 │ Cabin       │  687 │    147 │   204 │   891 │
│ 11 │ Embarked    │    2 │      3 │   889 │   891 │
└────┴─────────────┴──────┴────────┴───────┴───────┘
[12 rows x 5 columns]

Note

classmethod from_pickle(filename: str = None, **kwargs) SQLDataModel

Returns the SQLDataModel object from the provided filename. If None, the current directory will be scanned for the default SQLDataModel.to_pickle() format.

Parameters:
  • filename (str, optional) – The name of the pickle file to load. If None, the current directory will be scanned for the default filename. Default is None.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor, these will override the properties loaded from filename.

Returns:

The SQLDataModel object created from the loaded pickle file.

Return type:

SQLDataModel

Raises:
  • TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.

  • FileNotFoundError – If the provided filename could not be found or does not exist.

Example:

from SQLDataModel import SQLDataModel

headers = ['Name','Age','Sex']
data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')]

# Create the model with sample data
sdm = SQLDataModel(data=data, headers=headers)

# Filepath
pkl_file = 'people.sdm'

# Save the model
sdm.to_pickle(filename=pkl_file)

# Load it back from file
sdm = SQLDataModel.from_pickle(filename=pkl_file)

Note

  • All data, headers, data types and display properties will be saved when pickling.

  • Any additional kwargs provided will override those saved in the pickled model.

classmethod from_polars(df, headers: list[str] = None, **kwargs) SQLDataModel

Returns a SQLDataModel object created from the provided df representing a Polars DataFrame object. Note that polars must be installed in order to use this method.

Parameters:
  • df (polars.DataFrame) – The Polars DataFrame to convert to a SQLDataModel.

  • headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the df object.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the Polars DataFrame.

Return type:

SQLDataModel

Raises:
  • ModuleNotFoundError – If the required package polars is not found.

  • TypeError – If df argument is not of type polars.DataFrame.

Example:

import polars as pl
from SQLDataModel import SQLDataModel

# Sample data
data = {
    'Name': ['Beth', 'John', 'Alice', 'Travis'],
    'Age': [27, 30, 28, 35],
    'Height': [172.4, 175.3, 162.0, 185.8]
}

# Create the polars DataFrame
df = pl.DataFrame(data)

# Create a SQLDataModel object
sdm = SQLDataModel.from_polars(df)

# View result
print(sdm)

This will output a SQLDataModel constructed from the Polars df:

┌────────┬─────┬─────────┐
│ Name   │ Age │  Height │
├────────┼─────┼─────────┤
│ Beth   │  27 │  172.40 │
│ John   │  30 │  175.30 │
│ Alice  │  28 │  162.00 │
│ Travis │  35 │  185.80 │
└────────┴─────┴─────────┘
[4 rows x 3 columns]

Note

  • If headers are not provided, the columns from the provided DataFrame’s columns will be used as the new SQLDataModel headers.

  • Polars uses different data types than those used by SQLDataModel, see SQLDataModel.set_column_dtypes() for specific casting rules.

  • See related SQLDataModel.to_polars() for the inverse method of converting a SQLDataModel into a Polars DataFrame object.

classmethod from_pyarrow(table, **kwargs) SQLDataModel

Returns a new SQLDataModel instance from the provided Apache Arrow object.

Parameters:
  • table (pyarrow.lib.Table) – Apache Arrow object from which to construct a new SQLDataModel object.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed.

  • TypeError – If the provided table argument is not of type ‘pyarrow.lib.Table’.

Returns:

A new SQLDataModel instance representing the data in the provided Apache Arrow object.

Return type:

SQLDataModel

Example:

import pyarrow as pa
from SQLDataModel import SQLDataModel

# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Grade': [3.8, 3.9, 3.2],
}

# Create PyArrow table from data
table = pa.Table.from_pydict(data)

# Create model from PyArrow table
sdm = SQLDataModel.from_pyarrow(table)

This will output:

┌─────────┬──────┬───────┐
│ Name    │  Age │ Grade │
├─────────┼──────┼───────┤
│ Alice   │   25 │  3.80 │
│ Bob     │   30 │  3.90 │
│ Charlie │   35 │  3.20 │
└─────────┴──────┴───────┘
[3 rows x 3 columns]

Note

classmethod from_shape(shape: tuple[int, int], fill: Any = None, headers: list[str] = None, dtype: Literal['bytes', 'date', 'datetime', 'float', 'int', 'str'] = None, **kwargs) SQLDataModel

Returns a SQLDataModel from shape (N rows, M columns) as a convenience method to quickly build a model through an iterative approach. By default, no particular data type is assigned given the flexibility of sqlite3, however one can be inferred by providing an initial fill value or explicitly by providing the dtype argument.

Parameters:
  • shape (tuple[int, int]) – The shape to initialize the SQLDataModel with as (M, N) where M is the number of rows and N is the number of columns.

  • fill (Any, optional) – The scalar fill value to populate the new SQLDataModel with. Default is None, using SQL null values or deriving from dype if provided.

  • headers (list[str], optional) – The headers to use for the model. Default is None, incrementing headers 0, 1, ..., N where N is the number of columns.

  • dtype (str, optional) – A valid python or SQL datatype to initialize the n-dimensional model with. Default is None, using the SQL text type.

  • **kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.

Raises:
  • TypeError – If M or N are not of type ‘int’ representing a valid shape to initialize a SQLDataModel with.

  • ValueError – If M or N are not positive integer values representing valid nonzero row and column dimensions.

  • ValueError – If dtype is not a valid python or SQL convertible datatype to initialize the model with.

Returns:

Instance with the specified number of rows and columns, initialized with by dtype fill values or with None values (default).

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create a 3x3 model filled by 'X'
sdm = SQLDataModel.from_shape((3,3), fill='X')

# View it
print(sdm)

This will output a 3x3 grid of ‘X’ characters:

┌───┬─────┬─────┬─────┐
│   │ 0   │ 1   │ 2   │
├───┼─────┼─────┼─────┤
│ 0 │ X   │ X   │ X   │
│ 1 │ X   │ X   │ X   │
│ 2 │ X   │ X   │ X   │
└───┴─────┴─────┴─────┘
[3 rows x 3 columns]

We can iteratively build the model from the shape dimensions:

from SQLDataModel import SQLDataModel

# Define shape
shape = (6,6)

# Initialize the multiplcation table with integer dtypes
mult_table = SQLDataModel.from_shape(shape=shape, dtype='int')

# Construct the table values
for x in range(shape[0]):
    for y in range(shape[1]):
        mult_table[x, y] = x * y

# View the multiplcation table
print(mult_table)

This will output our 6x6 multiplication table:

┌───┬─────┬─────┬─────┬─────┬─────┬─────┐
│   │   0 │   1 │   2 │   3 │   4 │   5 │
├───┼─────┼─────┼─────┼─────┼─────┼─────┤
│ 0 │   0 │   0 │   0 │   0 │   0 │   0 │
│ 1 │   0 │   1 │   2 │   3 │   4 │   5 │
│ 2 │   0 │   2 │   4 │   6 │   8 │  10 │
│ 3 │   0 │   3 │   6 │   9 │  12 │  15 │
│ 4 │   0 │   4 │   8 │  12 │  16 │  20 │
│ 5 │   0 │   5 │  10 │  15 │  20 │  25 │
└───┴─────┴─────┴─────┴─────┴─────┴─────┘
[6 rows x 6 columns]
Change Log:
  • Version 0.5.2 (2024-05-13):
    • Added shape parameter in lieu of separate n_rows and n_cols arguments.

    • Added fill parameter to populate resulting SQLDataModel with values to override type-specific initialization defaults.

    • Added headers parameter to explicitly set column names when creating the SQLDataModel.

    • Added **kwargs parameter to align more closely with usage patterns of other model initializing constructor methods.

Note

  • If both fill and dtype are provided, the data type will be derived from type(fill) overriding or ignoring the specified dtype.

  • If only dtype is provided, sensible default initialization fill values will be used to populate the model such as 0 or 0.0 for numeric and empty string or null for others.

  • For those data types not natively implemented by sqlite3 such as date and datetime, today’s date and now’s datetime will be used respectively for initialization values.

classmethod from_sql(sql: str, con: Connection | Any, dtypes: dict = None, **kwargs) SQLDataModel

Create a SQLDataModel object by executing the provided SQL query using the specified SQL connection. If a single word is provided as the sql, the method wraps it and executes a select all treating the text as the target table.

Supported Connection APIs:
  • SQLite using sqlite3

  • PostgreSQL using psycopg2

  • SQL Server ODBC using pyodbc

  • Oracle using cx_Oracle

  • Teradata using teradatasql

Parameters:
  • sql (str) – The SQL query to execute and use to create the SQLDataModel.

  • con (sqlite3.Connection | Any) – The database connection object, supported connection APIs are sqlite3, psycopg2, pyodbc, cx_Oracle, teradatasql

  • dtypes (dict, optional) – A dictionary of the format 'column': 'python dtype' to assign to values. Default is None, mapping types from source connection.

  • **kwargs – Additional arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the executed SQL query.

Return type:

SQLDataModel

Raises:
  • TypeError – If dtypes argument is provided and is not of type dict representing python data types to assign to values.

  • SQLProgrammingError – If the provided SQL connection is not opened or valid, or the SQL query is invalid or malformed.

  • DimensionError – If the provided SQL query returns no data.

Examples:

From SQL Table
from SQLDataModel import SQLDataModel

# Single word parameter
sdm = SQLDataModel.from_sql("table_name", sqlite3.Connection)

# Equilavent query executed
sdm = SQLDataModel.from_sql("select * from table_name", sqlite3.Connection)
From SQLite Database
import sqlite3
from SQLDataModel import SQLDataModel

# Create connection object
sqlite_db_conn = sqlite3.connect('./database/users.db')

# Basic usage with a select query
sdm = SQLDataModel.from_sql("SELECT * FROM my_table", sqlite_db_conn)

# When a single word is provided, it is treated as a table name for a select all query
sdm_table = SQLDataModel.from_sql("my_table", sqlite_db_conn)
From PostgreSQL Database
import psycopg2
from SQLDataModel import SQLDataModel

# Create connection object
pg_db_conn = psycopg2.connect('dbname=users user=postgres password=postgres')

# Basic usage with a select query
sdm = SQLDataModel.from_sql("SELECT * FROM my_table", pg_db_conn)

# When a single word is provided, it is treated as a table name for a select all query
sdm_table = SQLDataModel.from_sql("my_table", pg_db_conn)
From SQL Server Databse
import pyodbc
from SQLDataModel import SQLDataModel

# Create connection object
con = pyodbc.connect("DRIVER={SQL Server};SERVER=host;DATABASE=db;UID=user;PWD=pw;")

# Basic usage with a select query
sdm = SQLDataModel.from_sql("SELECT * FROM my_table", con)

# When a single word is provided, it is treated as a table name for a select all query
sdm_table = SQLDataModel.from_sql("my_table", con)

Note

  • Unsupported connection object will output a SQLDataModelWarning advising unstable or undefined behaviour.

  • The dtypes, if provided, are only applied to sqlite3 connection objects as remaining supported connections implement SQL to python adapters.

  • See related SQLDataModel.to_sql() for writing to SQL database connections.

classmethod from_text(text_source: str, table_identifier: int = 1, encoding: str = 'utf-8', headers: list[str] = None, **kwargs) SQLDataModel

Returns a new SQLDataModel generated from the provided text_source, either as a file if the path exists, or from a raw string literal if the path does not exist.

Parameters:
  • text_source (str) – The path to the tabular data file or a raw string literal containing tabular data.

  • table_identifier (int, optional) – The index position of the target table within the text source. Default is 1.

  • encoding (str, optional) – The encoding used to decode the text source if it is a file. Default is ‘utf-8’.

  • headers (list, optional) – The headers to use for the provided data. Default is to use the first row.

  • **kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.

Returns:

The SQLDataModel object created from the provided tabular data.

Return type:

SQLDataModel

Raises:
  • TypeError – If text_source is not a string or table_identifier is not an integer.

  • ValueError – If no tabular data is found in text_source, if parsing fails to extract valid tabular data, or if the provided table_identifier is out of range.

  • IndexError – If the provided table_identifier exceeds the number of tables found in text_source.

  • Exception – If an error occurs while attempting to read from or process the provided text_source.

Example:

from SQLDataModel import SQLDataModel

# Text source containing tabular data
text_source = "/path/to/tabular_data.txt"

# Create the model using the text source
sdm = SQLDataModel.from_text(text_source, table_identifier=2)

Note

  • This method is made for parsing SQLDataModel formatted text, such as the kind generated with print(sdm) or the output created by the inverse method SQLDataModel.to_text()

  • For parsing other delimited tabular data, this method calls the related SQLDataModel.from_csv() method, which parses tabular data constructed with common delimiters.

generate_apply_function_stub() str

Generates a function template using the current SQLDataModel to format function arguments for the SQLDataModel.apply_function_to_column() method.

Returns:

A string representing the function template.

Return type:

str

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('data.csv')

# Create the stub
stub = sdm.generate_apply_function_stub()

# View it
print(stub)

This will output:

def func(user_name:str, user_age:int, user_salaray:float):
    # apply logic and return value
    return

Containing all the required inputs and column names needed to generate a compatible function to apply to the model and can be copy pasted into existing code.

Note

  • This method is to meant as a general informative tool or for debugging assistance if needed

  • See SQLDataModel.apply() method for usage and implementation of functions in SQLDataModel using sqlite3

static generate_html_table_chunks(html_source: str) Generator[str, None, None]

Generate chunks of HTML content for all <table> elements found in provided source as complete and unbroken chunks for parsing.

Parameters:

html_source (str) – The raw HTML content from which to generate chunks.

Raises:

ValueError – If zero <table> elements were found in html_source provided.

Yields:

str – Chunks of HTML content containing complete <table> elements.

Example:

from SQLDataModel import SQLDataModel

# HTML content to chunk
html_source = '''
<html>
    <table><tr><td>Table 1</td></tr></table>
    ...
    <p>Non-table elements</p>
    ...
    <table><tr><td>Table 2</td></tr></table>
</html>
'''

# Generate and view the returned table chunks
for chunk in SQLDataModel.generate_html_table_chunks(html_source):
    print('Chunk:', chunk)

This will output:

Chunk: <table><tr><td>Table 1</td></tr></table>
Chunk: <table><tr><td>Table 2</td></tr></table>

Note

  • HTML content before the first <table> element and after the last </table> element is ignored and not yielded.

  • See SQLDataModel.from_html() for full implementation and how this function is used for HTML parsing.

get_column_alignment() str

Returns the current column_alignment property value, dynamic by default.

Returns:

The current value of the column_alignment property.

Return type:

str

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get the current alignment value
alignment = sdm.get_column_alignment()

# Outputs 'dynamic'
print(alignment)

Note

get_column_dtypes(columns: str | int | list = None, dtypes: Literal['python', 'sql'] = 'python') dict

Get the data types of specified columns as either Python or SQL datatypes as a dict in the format of {'column': 'dtype'}.

Parameters:
  • columns (str | int | list) – The column or columns for which to retrieve data types. Defaults to all columns.

  • dtypes (Literal["python", "sql"]) – The format in which to retrieve data types. Defaults to “python”.

Raises:
  • TypeError – If columns is not of type str, int, or list.

  • IndexError – If columns is of type int and the index is outside the valid range.

  • ValueError – If a specified column in columns is not found in the current dataset. Use SQLDataModel.get_headers() to view valid columns.

Returns:

A dictionary mapping column names to their data types.

Return type:

dict

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get all column python dtypes
sdm_dtypes = sdm.get_column_dtypes()

# View dict items
for col, dtype in sdm_dtypes.items():
    print(f"{col}: {dtype}")

This will output:

first: str
last: str
age: int
service: float
hire_date: date

Get SQL data types as well:

# Get specific column sql dtypes
sdm_dtypes = sdm.get_column_dtypes(columns=['first','age','service'], dtypes="sql")

# View dict items
for col, dtype in sdm_dtypes.items():
    print(f"{col}: {dtype}")

This will output:

first: TEXT
age: INTEGER
service: REAL

Note

  • SQLDataModel index column is not included, only columns specified in the SQLDataModel.headers attribute are in scope.

  • Only the dtypes are returned, any primary key references are removed to ensure compatability with external calls.

  • Python datatypes are returned in lower case, while SQL dtypes are returned in upper case to reflect convention.

  • See SQLDataModel.dtypes for direct mapping from column to data type returned as {'col': 'dtype'}.

get_display_float_precision() int

Retrieves the current float display precision used exclusively for representing the values of real numbers in the repr method for the SQLDataModel. Default value is set to 4 decimal places of precision.

Returns:

The current float display precision.

Return type:

int

Note

  • The float display precision is the number of decimal places to include when displaying real numbers in the string representation of the SQLDataModel.

  • This value is utilized in the repr method to control the precision of real number values.

  • The method does not affect the actual value of float dtypes in the underlying SQLDataModel

get_display_index() bool

Returns the current value set at SQLDataModel.display_index, which determines whether or not the index is displayed in the SQLDataModel representation.

Returns:

The current value of the display_index property.

Return type:

bool

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get the current value for displaying the index
display_index = sdm.get_display_index()

# Output: True
print(display_index)

Note

get_display_max_rows() int | None

Retrieves the current value at SQLDataModel.display_max_rows, which determines the maximum rows displayed for the SQLDataModel.

Returns:

The current value set at SQLDataModel.display_max_rows.

Return type:

int or None

Example:

from SQLDataModel import SQLDataModel

# Create model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get current value
display_max_rows = sdm.get_display_max_rows()

# By default rows will be limited by current terminal height
print(display_max_rows) # None

Note

  • This does not affect the actual number of rows in the model, only the maximum displayed.

  • Use SQLDataModel.set_display_max_rows() to explicitly set a max row limit instead of using terminal height.

get_headers() list[str]

Returns the current SQLDataModel headers.

Returns:

A list of strings representing the headers.

Return type:

list

Example:

from SQLDataModel import SQLDataModel

# Create model
sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Get current model headers
headers = sdm.get_headers()

# Display values
print(headers) # outputs: ['First Name', 'Last Name', 'Salary']
get_indicies() tuple

Returns the current valid row indicies for the SQLDataModel instance.

Returns:

A tuple of the current values for SQLDataModel.sql_idx in ascending order.

Return type:

tuple

Example:

from SQLDataModel import SQLDataModel

headers = ['Name', 'Age', 'Height']
data = [('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8)]

# Create the model
sdm = SQLDataModel(data, headers)

# Get current valid indicies
valid_indicies = sdm.get_indicies()

# View results
print(valid_indicies)

This will output:

(0, 1, 2)
Notes
  • Primary use is to confirm valid model indexing when starting index != 0 or filtering changes minimum/maximum indexes.

get_max_column_width() int

Returns the current max_column_width property value.

Returns:

The current value of the max_column_width property.

Return type:

int

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get the current max column width value
max_width = sdm.get_max_column_width()

# Output
print(max_width)  # 32
get_min_column_width() int

Returns the current min_column_width property value.

Returns:

The current value of the min_column_width property.

Return type:

int

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Get and save the current value
min_width = sdm.get_min_column_width()

# Output
print(min_width)  # 6
get_model_name() str

Returns the SQLDataModel table name currently being used by the model as an alias for any SQL queries executed by the user and internally.

Returns:

The current SQLDataModel table name set by value of attribute SQLDataModel.model_name.

Return type:

str

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2'])

# Get the current name
model_name = sdm.get_model_name()

# View it
print(f'The model is currently using the table name: {model_name}')
get_shape() tuple[int, int]

Returns the current shape of the SQLDataModel as a tuple of (rows x columns).

Returns:

A tuple representing the current dimensions of rows and columns in the SQLDataModel.

Return type:

tuple[int, int]

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel([[1,2,3],
                    [4,5,6],
                    [7,8,9]])

# Get the current shape
shape = sdm.get_shape()

# View it
print("shape:", shape)

This will output:

shape: (3, 3)

The shape can also be seen when printing the model:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel([[1,2,3],
                    [4,5,6],
                    [7,8,9]])

# View it and the shape
print(sdm, "<-- shape is also visible here")

This will output:

┌───┬───────┬───────┬───────┐
│   │ col_0 │ col_1 │ col_2 │
├───┼───────┼───────┼───────┤
│ 0 │     1 │     2 │     3 │
│ 1 │     4 │     5 │     6 │
│ 2 │     7 │     8 │     9 │
└───┴───────┴───────┴───────┘
[3 rows x 3 columns] <-- shape is also visible here
Change Log:
  • Version 0.3.6 (2024-04-09):

Note

classmethod get_supported_sql_connections() tuple

Returns the currently tested DB API 2.0 dialects for use with SQLDataModel.from_sql() method.

Returns:

A tuple of supported DB API 2.0 dialects.

Return type:

tuple

Example:

from SQLDataModel import SQLDataModel

# Get supported dialects
supported_dialects = SQLDataModel.get_supported_sql_connections()

# View details
print(supported_dialects)

# Outputs
supported_dialects = ('sqlite3', 'psycopg2', 'pyodbc', 'cx_oracle', 'teradatasql')
group_by(columns: str | list[str], order_by_count: bool = True) SQLDataModel

Returns a new SQLDataModel after performing a group by operation on specified columns.

Parameters:
  • columns (str, list, tuple) – Columns to group by. Accepts either individual strings or a list/tuple of strings.

  • order_by_count (bool, optional) – If True (default), orders the result by count. If False, orders by the specified columns.

Raises:
  • TypeError – If the columns argument is not of type str, list, or tuple.

  • ValueError – If any specified column does not exist in the current model.

  • SQLProgrammingError – If any specified columns or aggregate keywords are invalid or incompatible with the current model.

Returns:

A new SQLDataModel instance containing the result of the group by operation.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01', 'Female'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female')
]
# Create the model
sdm = SQLDataModel(data, headers, display_float_precision=2, display_index=True)

# Group by 'gender' column
sdm_gender = sdm.group_by("gender")

# View model
print(sdm_gender)

This will output:

┌───┬────────┬───────┐
│   │ gender │ count │
├───┼────────┼───────┤
│ 0 │ Male   │     3 │
│ 1 │ Female │     2 │
└───┴────────┴───────┘
[2 rows x 2 columns]

Multiple columns can also be used to group by:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('data.csv')

# Group by multiple columns
sdm.group_by(["country", "state", "city"])

Note

  • Use order_by_count=False to change ordering from count to column arguments provided.

head(n_rows: int = 5) SQLDataModel

Returns the first n_rows of the current SQLDataModel.

Parameters:

n_rows (int, optional) – Number of rows to return. Defaults to 5.

Raises:

TypeError – If n_rows argument is not of type ‘int’ representing the number of rows to return from the head of the model.

Returns:

A new SQLDataModel instance containing the specified number of rows.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Countries data available for sample dataset
url = 'https://developers.google.com/public-data/docs/canonical/countries_csv'

# Create the model
sdm = SQLDataModel.from_html(url)

# Get head of model
sdm_head = sdm.head()

# View it
print(sdm_head)

This will grab the top 5 rows by default:

┌───┬─────────┬──────────┬───────────┬────────────────┐
│   │ country │ latitude │ longitude │ name           │
├───┼─────────┼──────────┼───────────┼────────────────┤
│ 0 │ AF      │  33.9391 │   67.7100 │ Afghanistan    │
│ 1 │ AL      │  41.1533 │   20.1683 │ Albania        │
│ 2 │ DZ      │  28.0339 │    1.6596 │ Algeria        │
│ 3 │ AS      │ -14.2710 │ -170.1322 │ American Samoa │
│ 4 │ AD      │  42.5462 │    1.6016 │ Andorra        │
└───┴─────────┴──────────┴───────────┴────────────────┘
[5 rows x 4 columns]

Note

  • See related SQLDataModel.tail() for the opposite, grabbing the bottom n_rows from the current model.

header_master

Maps the current model’s column metadata in the format of 'column_name': ('sql_dtype', 'py_dtype', is_regular_column, 'default_alignment'), updated by SQLDataModel._update_model_metadata().

Type:

dict[str, tuple]

headers

The current column names of the model. If not provided, default column names will be used.

Type:

list[str]

hstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel

Horizontally stacks one or more SQLDataModel objects to the current model.

Parameters:
  • other (SQLDataModel or sequence of) – The SQLDataModel objects to horizontally stack.

  • inplace (bool, optional) – If True, performs the horizontal stacking in-place, modifying the current model. Defaults to False, returning a new SQLDataModel.

Returns:

The horizontally stacked SQLDataModel instance when inplace is False.

Return type:

SQLDataModel

Raises:
  • ValueError – If no additional SQLDataModels are provided for horizontal stacking.

  • TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.

  • SQLProgrammingError – If an error occurs when updating the model values in place.

Example:

from SQLDataModel import SQLDataModel

# Create models A and B
sdm_a = SQLDataModel([('A', 'B'), ('1', '2')], headers=['A1', 'A2'])
sdm_b = SQLDataModel([('C', 'D'), ('3', '4')], headers=['B1', 'B2'])

# Horizontally stack B onto A
sdm_ab = sdm_a.hstack(sdm_b)

# View stacked model
print(sdm_ab)

This will output the result of stacking B onto A, using each model’s headers and dtypes:

┌─────┬─────┬─────┬─────┐
│ A1  │ A2  │ B1  │ B2  │
├─────┼─────┼─────┼─────┤
│ A   │ B   │ C   │ D   │
│ 1   │ 2   │ 3   │ 4   │
└─────┴─────┴─────┴─────┘
[2 rows x 4 columns]

Multiple models can be stacked simultaneously, here we stack a total of 3 models:

# Create a third model C
sdm_c = SQLDataModel([('E', 'F'), ('5', '6')], headers=['C1', 'C2'])

# Horizontally stack three models
sdm_abc = sdm_a.hstack([sdm_b, sdm_c])

# View stacked result
print(sdm_abc)

This will output the result of stacking C and B onto A:

┌─────┬─────┬─────┬─────┬─────┬─────┐
│ A1  │ A2  │ B1  │ B2  │ C1  │ C2  │
├─────┼─────┼─────┼─────┼─────┼─────┤
│ A   │ B   │ C   │ D   │ E   │ F   │
│ 1   │ 2   │ 3   │ 4   │ 5   │ 6   │
└─────┴─────┴─────┴─────┴─────┴─────┘
[2 rows x 6 columns]

Note

  • Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use SQLDataModel.merge() for strict SQL joins instead of hstack.

  • Headers and data types are inherited from all the models being stacked, this requires aliasing duplicate column names if present, see SQLDataModel.alias_duplicates() for aliasing rules.

  • Use setitem syntax such as sdm['New Column'] = values to create new columns directly into the current model instead of stacking or see SQLDataModel.add_column_with_values() for convenience method accomplishing the same.

  • See SQLDataModel.vstack() for vertical stacking.

indicies

The current valid row indicies of the model.

Type:

tuple

infer_dtypes(n_samples: int = 16, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') None

Infer and set data types for columns based on a random subset of n_samples from the current model. The dateutil library is required for complex date and datetime parsing, if the module is not found then date_format and datetime_format will be used for dates and datetimes respectively.

Parameters:
  • n_samples (int) – The number of random samples to use for data type inference. Default set to 16.

  • date_format (str) – The format string to use for parsing date values if dateutil library is not found. Default is ‘%Y-%m-%d’.

  • datetime_format (str) – The format string to use for parsing datetime values if dateutil library is not found. Default is ‘%Y-%m-%d %H:%M:%S’.

Raises:
  • TypeError – If argument for n_samples is not of type int or if argument for date_format or datetime_format is not of type ‘str’.

  • ValueError – If the current model contains zero columns from which to infer types from.

  • DimensionError – If the current model contains insufficient rows to sample from.

Returns:

Inferred column types are updated and None is returned.

Return type:

None

Example:

from SQLDataModel import SQLDataModel

# Sample data of ``str`` containing probable datatypes
headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', '27', '1.22', '2023-02-01'),
    ('Sarah', 'West', '39', '0.7', '2023-10-01'),
    ('Mike', 'Harlin', '36', '3.9', '2020-08-27'),
    ('Pat', 'Douglas', '42', '11.5', '2015-11-06'),
    ('Kelly', 'Lee', '32', '8.0', '2016-09-18')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get current column dtypes for reference
dtypes_before = sdm.get_column_dtypes()

# Infer and set data types based on 10 random samples
sdm.infer_dtypes(n_samples=10)

# View updated model
print(sdm)

This will output data with dtypes correctly aligned:

┌───────┬─────────┬──────┬─────────┬────────────┐
│ first │ last    │  age │ service │ hire_date  │
├───────┼─────────┼──────┼─────────┼────────────┤
│ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │
│ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │
│ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │
│ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │
│ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │
└───────┴─────────┴──────┴─────────┴────────────┘
[5 rows x 5 columns]
# Get new column types to confirm
dtypes_after = sdm.get_column_dtypes()

# View updated dtypes
for col in sdm.headers:
    print(f"{col:<10} {dtypes_before[col]} -> {dtypes_after[col]}")

This will output:

first:      str -> str
last:       str -> str
age:        str -> int
service:    str -> float
hire_date:  str -> date
Related:

Note

  • If a single str instance is found in the samples, the corresponding column dtype will remain as str to avoid data loss.

  • Co-occurences of int & float, or date & datetime will favor the superset dtype after infer_threshold is met, so float and datetime respectively.

  • If a single datetime instance is found amongst a higher proportion of date dtypes, datetime will be used according to second rule.

  • If a single float instance is found amongst a higher proportion of int dtypes, float will be used according to second rule.

  • Ties between dtypes are broken according to current type < str < float < int < datetime < date < bytes < None

  • This method calls the set_column_dtypes() method once the column dtypes have been inferred if they differ from the current dtype.

static infer_str_type(obj: str, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') str

Infer the data type of the input object.

Parameters:
  • obj (str) – The object for which the data type is to be inferred.

  • date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.

  • datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.

Returns:

The inferred data type.

Return type:

str

Inference:
  • 'str': If the input object is a string, or cannot be parsed as another data type.

  • 'date': If the input object represents a date without time information.

  • 'datetime': If the input object represents a datetime with both date and time information.

  • 'int': If the input object represents an integer.

  • 'float': If the input object represents a floating-point number.

  • 'bool': If the input object represents a boolean value.

  • 'bytes': If the input object represents a binary array.

  • 'None': If the input object is None, empty, or not a string.

Note

  • This method attempts to infer the data type of the input object by evaluating its content.

  • If the input object is a string, it is parsed to determine whether it represents a date, datetime, integer, or float.

  • If the input object is not a string or cannot be parsed, its type is determined based on its Python type (bool, int, float, bytes, or None).

static infer_types_from_data(input_data: list[list], date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') list[str]

Infer the best types of input_data by using a simple presence-based voting scheme. Sampling is assumed prior to function call, treating input_data as already a sampled subset from the original data.

Parameters:
  • input_data (list[list]) – A list of lists containing the input data.

  • date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.

  • datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.

Returns:

A list representing the best-matching inferred types for each column based on the sampled data.

Return type:

list

Note

  • If multiple types are present in the samples, the most appropriate type is inferred based on certain rules.

  • If a column contains both date and datetime instances, the type is inferred as datetime.

  • If a column contains both int and float instances, the type is inferred as float.

  • If a column contains only str instances or multiple types with no clear choice, the type remains as str.

Related:
insert_row(index: int, values: list | tuple, on_conflict: Literal['replace', 'ignore'] = 'replace') None

Inserts a new row into the SQLDataModel at the specified index with the provided values.

Parameters:
  • index (int) – The position at which to insert the row.

  • values (list or tuple) – The values to be inserted into the row.

  • on_conflict (Literal['replace', 'ignore'], optional) – Specifies the action to take if the index already exists. Default is ‘replace’.

Raises:
  • TypeError – If index is not an integer or values is not a list or tuple.

  • ValueError – If on_conflict is not 'replace' or 'ignore'.

  • DimensionError – If the dimensions of the provided values are incompatible with the current model dimensions.

  • SQLProgrammingError – If there is an issue with the SQL execution during the insertion.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Sample data
data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')]

# Create the model
sdm = SQLDataModel(data, headers=['Name','Age','Sex'])

# Insert a new row at index 3
sdm.insert_row(3, ['David', 35, 'M'])

# Insert or replace row at index 1
sdm.insert_row(1, ['Beth', 27, 'F'], on_conflict='replace')

# View result
print(sdm)

This will output the modified model:

┌───┬───────┬─────┬─────┐
│   │ Name  │ Age │ Sex │
├───┼───────┼─────┼─────┤
│ 0 │ Alice │  20 │ F   │
│ 1 │ Beth  │  27 │ F   │
│ 2 │ Chris │  30 │ M   │
│ 3 │ David │  35 │ M   │
└───┴───────┴─────┴─────┘
[4 rows x 3 columns]
Change Log:
  • Version 0.6.0 (2024-05-14):
    • Backward incompatible changes made to arguments and behavior, added index and on_conflict parameters for greater specificity and to align with broader conventions surrounding insert methods.

Note

  • Use on_conflict = 'ignore' to take no action if row already exists, and on_conflict = 'replace' to replace it.

  • See SQLDataModel.append_row() for appending rows at the next available index instead of insertion at index.

iter_rows(min_row: int = None, max_row: int = None, index: bool = True, include_headers: bool = False) Generator

Returns a generator object of the rows in the model from min_row to max_row.

Parameters:
  • min_row (int, optional) – The minimum row index to start iterating from (inclusive). Defaults to None.

  • max_row (int, optional) – The maximum row index to iterate up to (exclusive). Defaults to None.

  • index (bool, optional) – Whether to include the row index in the output. Defaults to True.

  • include_headers (bool, optional) – Whether to include headers as the first row. Defaults to False.

Yields:

Generator – Rows from the specified range, including headers if specified.

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Iterate over the rows
for row in sdm.iter_rows(min_row=2, max_row=4):
    pass # Do stuff
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

iter_tuples(include_idx_col: bool = False) Generator

Returns a generator object of the SQLDataModel as namedtuples using current headers as field names.

Parameters:

include_idx_col (bool, optional) – Whether to include the index column in the namedtuples. Defaults to False.

Raises:

ValueError – Raised if headers are not valid Python identifiers. Use SQLDataModel.normalize_headers() method to fix.

Yields:

Generator – Namedtuples representing rows with field names based on current headers.

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Iterate over the namedtuples
for row_tuple in sdm.iter_tuples(include_idx_col=True):
    pass # Do stuff with namedtuples
max() SQLDataModel

Returns a new SQLDataModel containing the maximum value of all non-null values for each column in a row-wise orientation.

Returns:

A new SQLDataModel containing the maximum non-null value for each column.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data with missing values
headers = ['Name', 'Age', 'Gender', 'Tenure']
data = [
    ('Alice', 25, 'Female', 1.0),
    ('Bob', None, 'Male', 2.7),
    ('Charlie', 30, 'Male', None),
    ('David', None, 'Male', 3.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get maximum values
min_values = sdm.min()

# View result
print(min_values)

This will output the maximum value of all non-null values for each column:

┌───────┬─────┬────────┬────────┐
│ Name  │ Age │ Gender │ Tenure │
├───────┼─────┼────────┼────────┤
│ David │  30 │ Male   │   3.80 │
└───────┴─────┴────────┴────────┘
[1 rows x 4 columns]

Note

max_column_width

The maximum column width in characters to use for string representations of the data. Default is 38.

Type:

int

mean() SQLDataModel

Returns a new SQLDataModel containing the mean value of all viable columns in the current model. Calculated by sum(x_i, ..., x_n) * (1 / N)

Returns:

A new SQLDataModel containing the mean values of each column.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Birthday', 'Height', 'Date of Hire']
data = [
    ('John', 30, '1994-06-15', 175.3, '2018-03-03 11:20:19'),
    ('Alice', 28, '1996-11-20', 162.0, '2023-04-24 08:45:30'),
    ('Travis', 37, '1987-01-07', 185.8, '2012-10-06 15:30:40')
]

# Create the model and infer correct types
sdm = SQLDataModel(data, headers, infer_dtypes=True)

# View full model
print(sdm)

This will output the sample model we’ll be using to calculate mean values for:

┌────────┬─────┬────────────┬─────────┬─────────────────────┐
│ Name   │ Age │ Birthday   │  Height │ Date of Hire        │
├────────┼─────┼────────────┼─────────┼─────────────────────┤
│ John   │  30 │ 1994-06-15 │  175.30 │ 2018-03-03 11:20:19 │
│ Alice  │  28 │ 1996-11-20 │  162.00 │ 2023-04-24 08:45:30 │
│ Travis │  37 │ 1987-01-07 │  185.80 │ 2012-10-06 15:30:40 │
└────────┴─────┴────────────┴─────────┴─────────────────────┘
[3 rows x 5 columns]

Now let’s find the mean values:

# Calculate the mean values
sdm_mean = sdm.mean()

# View result
print(sdm_mean)

This will output the mean values for the “Age”, “Birthday”, “Height” and “Date of Hire” columns:

┌──────┬────────┬────────────┬─────────┬─────────────────────┐
│ Name │    Age │ Birthday   │  Height │ Date of Hire        │
├──────┼────────┼────────────┼─────────┼─────────────────────┤
│ NaN  │  31.67 │ 1992-10-14 │  174.37 │ 2018-01-30 11:52:09 │
└──────┴────────┴────────────┴─────────┴─────────────────────┘
[1 rows x 5 columns]

Note

  • Only non-null values are included in the calculation of the sum and the total number of values in the column, use SQLDataModel.fillna() to fill null values.

  • For date and datetime columns values are converted to julian days prior to calculation and recast into original data type, some imprecision may occur as a result.

  • See SQLDataModel.min() for returning the minimum value, SQLDataModel.max() for maximum value, and SQLDataModel.describe() for descriptive statical values.

merge(merge_with: SQLDataModel = None, how: Literal['left', 'right', 'inner', 'full outer', 'cross'] = 'left', left_on: str = None, right_on: str = None, include_join_column: bool = False) SQLDataModel

Merges two SQLDataModel instances based on specified columns and merge type, how, returning the result as a new instance. If the join column shares the same name in both models, left_on and right_on column arguments are not required and will be inferred. Otherwise, explicit arguments for both are required.

Parameters:
  • merge_with (SQLDataModel) – The SQLDataModel to merge with the current model.

  • how (Literal["left", "right", "inner", "full outer", "cross"]) – The type of merge to perform.

  • left_on (str) – The column name from the current model to use as the left join key.

  • right_on (str) – The column name from the merge_with model to use as the right join key.

  • include_join_column (bool) – If the shared column being used as the join key should be included from both tables. Default is False.

Raises:
  • TypeError – If merge_with is not of type SQLDataModel.

  • DimensionError – If no shared column exists, and explicit left_on and right_on arguments are not provided.

  • ValueError – If the specified left_on or right_on column is not found in the respective models.

Returns:

A new SQLDataModel containing the product of the merged result.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Left table data with ID column
left_headers = ["Name", "Age", "ID"]
left_data = [
    ["Bob", 35, 1],
    ["Alice", 30, 5],
    ["David", 40, None],
    ["Charlie", 25, 2]
]
# Right table data with shared ID column
right_headers = ["ID", "Country"]
right_data = [
    [1, "USA"],
    [2, "Germany"],
    [3, "France"],
    [4, "Latvia"]
]

# Create the left and right tables
sdm_left = SQLDataModel(left_data, left_headers)
sdm_right = SQLDataModel(right_data, right_headers)

Here are the left and right tables we will be joining:

Left Table:                     Right Table:
┌─────────┬──────┬──────┐       ┌──────┬─────────┐
│ Name    │  Age │   ID │       │   ID │ Country │
├─────────┼──────┼──────┤       ├──────┼─────────┤
│ Bob     │   35 │    1 │       │    1 │ USA     │
│ Alice   │   30 │    5 │       │    2 │ Germany │
│ David   │   40 │      │       │    3 │ France  │
│ Charlie │   25 │    2 │       │    4 │ Latvia  │
└─────────┴──────┴──────┘       └──────┴─────────┘
[4 rows x 3 columns]            [4 rows x 2 columns]
Left Join
# Create a model by performing a left join with the tables
sdm_joined = sdm_left.merge(sdm_right, how="left")

# View result
print(sdm_joined)

This will output:

Left Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Alice   │   30 │    5 │         │
│ David   │   40 │      │         │
│ Charlie │   25 │    2 │ Germany │
└─────────┴──────┴──────┴─────────┘
[4 rows x 4 columns]
Right Join
# Create a model by performing a right join with the tables
sdm_joined = sdm_left.merge(sdm_right, how="right")

# View result
print(sdm_joined)

This will output:

Right Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Charlie │   25 │    2 │ Germany │
│         │      │      │ France  │
│         │      │      │ Latvia  │
└─────────┴──────┴──────┴─────────┘
[4 rows x 4 columns]
Inner Join
# Create a model by performing an inner join with the tables
sdm_joined = sdm_left.merge(sdm_right, how="inner")

# View result
print(sdm_joined)

This will output:

Inner Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Charlie │   25 │    2 │ Germany │
└─────────┴──────┴──────┴─────────┘
[2 rows x 4 columns]
Full Outer Join
# Create a model by performing a full outer join with the tables
sdm_joined = sdm_left.merge(sdm_right, how="full outer")

# View result
print(sdm_joined)

This will output:

Full Outer Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Alice   │   30 │    5 │         │
│ David   │   40 │      │         │
│ Charlie │   25 │    2 │ Germany │
│         │      │      │ France  │
│         │      │      │ Latvia  │
└─────────┴──────┴──────┴─────────┘
[6 rows x 4 columns]
Cross Join
# Create a model by performing a cross join with the tables
sdm_joined = sdm_left.merge(sdm_right, how="cross")

# View result
print(sdm_joined)

This will output:

Cross Join:
┌─────────┬──────┬──────┬─────────┐
│ Name    │  Age │   ID │ Country │
├─────────┼──────┼──────┼─────────┤
│ Bob     │   35 │    1 │ USA     │
│ Bob     │   35 │    1 │ Germany │
│ Bob     │   35 │    1 │ France  │
│ Bob     │   35 │    1 │ Latvia  │
│ Alice   │   30 │    5 │ USA     │
│ Alice   │   30 │    5 │ Germany │
│ Alice   │   30 │    5 │ France  │
│ Alice   │   30 │    5 │ Latvia  │
│ David   │   40 │      │ USA     │
│ David   │   40 │      │ Germany │
│ David   │   40 │      │ France  │
│ David   │   40 │      │ Latvia  │
│ Charlie │   25 │    2 │ USA     │
│ Charlie │   25 │    2 │ Germany │
│ Charlie │   25 │    2 │ France  │
│ Charlie │   25 │    2 │ Latvia  │
└─────────┴──────┴──────┴─────────┘
[16 rows x 4 columns]

Note

  • If include_join_column=False then only the left_on join column is included in the result, with the right_on column removed to avoid redundant shared key values.

  • If include_join_column=True then all the columns from both models are included in the result, with aliasing to avoid naming conflicts, see SQLDataModel.alias_duplicates() for details.

  • The resulting SQLDataModel is created based on the sqlite3 join definition and specified columns and merge type, for details see sqlite3 documentation.

min() SQLDataModel

Returns a new SQLDataModel containing the minimum value of all non-null values for each column in a row-wise orientation.

Returns:

A new SQLDataModel containing the minimum non-null value for each column.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data with missing values
headers = ['Name', 'Age', 'Gender', 'Tenure']
data = [
    ('Alice', 25, 'Female', 1.0),
    ('Bob', None, 'Male', 2.7),
    ('Charlie', 30, 'Male', None),
    ('David', None, 'Male', 3.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get minimum values
min_values = sdm.min()

# View result
print(min_values)

This will output the minimum value of all non-null values for each column:

┌───────┬─────┬────────┬────────┐
│ Name  │ Age │ Gender │ Tenure │
├───────┼─────┼────────┼────────┤
│ Alice │  25 │ Female │   1.00 │
└───────┴─────┴────────┴────────┘
[1 rows x 4 columns]

Note

min_column_width

The minimum column width in characters to use for string representations of the data. Default is 3.

Type:

int

normalize_headers(apply_function: Callable = None) None

Reformats the current SQLDataModel headers into an uncased normalized form using alphanumeric characters only. Wraps SQLDataModel.set_headers().

Parameters:

apply_function (Callable, optional) – Specify an alternative normalization pattern. When None, the pattern '[^0-9a-z _]+' will be used on uncased values.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create model
sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Use default normalization scheme, uncased and strips invalid SQL identifiers
sdm.normalize_headers()

# Get renamed headers after default normalization
sdm.get_headers() # now outputs ['first_name', 'last_name', 'salary']

# Or use custom renaming scheme
sdm.normalize_headers(lambda x: x.upper())

# Get renamed headers again
sdm.get_headers() # now outputs ['FIRST_NAME', 'LAST_NAME', 'SALARY']
rename_column(column: int | str, new_column_name: str) None

Renames a column in the SQLDataModel at the specified index or using the old column name with the provided value in new_column_name.

Parameters:
  • column (int|str) – The index or current str value of the column to be renamed.

  • new_column_name (str) – The new name as a str value for the specified column.

Raises:
  • TypeError – If the column or new_column_name parameters are invalid types.

  • IndexError – If the provided column index is outside the current column range.

  • SQLProgrammingError – If there is an issue with the SQL execution during the column renaming.

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the model with sample data
sdm = SQLDataModel(data,headers)

# Example: Rename the column at index 1 to 'first_name'
sdm.rename_column(1, 'first_name')

# Get current values
new_headers = sdm.get_headers()

# Outputs ['first_name', 'last', 'age']
print(new_headers)

Note

  • The method allows renaming a column identified by its index in the SQLDataModel.

  • Handles negative indices by adjusting them relative to the end of the column range.

  • If an error occurs during SQL execution, it rolls back the changes and raises a SQLProgrammingError with an informative message.

replace(pattern: str, replacement: str, inplace: bool = False, **kwargs) SQLDataModel

Replaces matching occurrences of a specified pattern with a replacement value in the SQLDataModel instance. If inplace is True, the method updates the existing SQLDataModel; otherwise, it returns a new SQLDataModel with the replacements applied.

Parameters:
  • pattern (str) – The substring or regular expression pattern to search for in each column.

  • replacement (str) – The string to replace the matched pattern with.

  • inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.

  • **kwargs – Additional keyword arguments to be passed to the execute_fetch method when not in-place.

Raises:

TypeError – If the pattern or replacement parameters are invalid types.

Returns:

If inplace=True, modifies the current instance in-place and returns None. Otherwise, returns a new SQLDataModel with the specified replacements applied.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service']
data = [
    ('John', 'Smith', 27, 1.22),
    ('Sarah', 'West', 39, 0.7),
    ('Mike', 'Harlin', 36, 3),
    ('Pat', 'Douglas', 42, 11.5)
]

# Create the model
sdm = SQLDataModel(data, headers,display_float_precision=2, display_index=False)

# Replace 'John' in the 'first' column
sdm['first'] = sdm['first'].replace("John","Jane")

# View model
print(sdm)

This will output:

┌───────┬─────────┬──────┬─────────┐
│ first │ last    │  age │ service │
├───────┼─────────┼──────┼─────────┤
│ Jane  │ Smith   │   27 │    1.22 │
│ Sarah │ West    │   39 │    0.70 │
│ Mike  │ Harlin  │   36 │    3.00 │
│ Pat   │ Douglas │   42 │   11.50 │
└───────┴─────────┴──────┴─────────┘
[4 rows x 4 columns]
reset_index(start_index: int = 0) None

Resets the index of the SQLDataModel instance inplace to zero-based sequential autoincrement, or to specified start_index base with sequential incrementation.

Parameters:

start_index (int, optional) – The starting index for the reset operation. Defaults to 0.

Raises:
  • TypeError – If provided start_index argument is not of type int

  • ValueError – If the specified start_index is greater than the minimum index in the current model.

  • SQLProgrammingError – If reset index execution results in constraint violation or programming error.

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age', 'service']
data = [
    (0, 'john', 'smith', 27, 1.22),
    (1, 'sarah', 'west', 39, 0.7),
    (2, 'mike', 'harlin', 36, 3),
    (3, 'pat', 'douglas', 42, 11.5)
]

# Create the model
sdm = SQLDataModel(data, headers)

# View current state
print(sdm)

This will output:

┌─────┬────────┬─────────┬────────┬─────────┐
│     │ first  │ last    │    age │ service │
├─────┼────────┼─────────┼────────┼─────────┤
│ 994 │ john   │ smith   │     27 │    1.22 │
│ 995 │ sarah  │ west    │     39 │    0.70 │
│ 996 │ mike   │ harlin  │     36 │    3.00 │
│ 997 │ pat    │ douglas │     42 │   11.50 │
└─────┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]

Now reset the index column:

from SQLDataModel import SQLDataModel

# Reset the index with default start value
sdm.reset_index()

# View updated model
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john   │ smith   │     27 │    1.22 │
│ 1 │ sarah  │ west    │     39 │    0.70 │
│ 2 │ mike   │ harlin  │     36 │    3.00 │
│ 3 │ pat    │ douglas │     42 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]

Reset the index to a custom value:

from SQLDataModel import SQLDataModel

# Reset the index with a different value
sdm.reset_index(start_index = -3)

# View updated model
print(sdm)

This will output:

┌────┬────────┬─────────┬────────┬─────────┐
│    │ first  │ last    │    age │ service │
├────┼────────┼─────────┼────────┼─────────┤
│ -3 │ john   │ smith   │     27 │    1.22 │
│ -2 │ sarah  │ west    │     39 │    0.70 │
│ -1 │ mike   │ harlin  │     36 │    3.00 │
│  0 │ pat    │ douglas │     42 │   11.50 │
└────┴────────┴─────────┴────────┴─────────┘
[4 rows x 4 columns]

Note

  • The current index should be viewed more as a soft row number, to assign hard indicies use SQLDataModel.freeze_index() method.

  • Setting start_index too a very large negative or positive integer made lead to unpredictable behavior.

row_count

The current row count of the model.

Type:

int

sample(n_samples: float | int = 0.05, **kwargs) SQLDataModel

Return a random sample of size n_samples as a new SQLDataModel.

Parameters:

n_samples (float | int) – Number of rows or proportion of rows to sample. Default set to 0.05, proportional to 5% of the current SQLDataModel.row_count. If n_samples is an integer, it represents the exact number of rows to sample where 0 < n_samples <= row_count. If n_samples is a float, it represents the proportion of rows to sample where 0.0 < n_samples <= 1.0.

Returns:

A new SQLDataModel instance containing the sampled rows.

Return type:

SQLDataModel

Raises:
  • TypeError – If the n_samples parameter is not of type ‘int’ or ‘float’.

  • ValueError – If the n_samples value is invalid or out of range.

This method generates a random sample of rows from the current SQLDataModel. The number of rows to sample can be specified either as an integer representing the exact number of rows or as a float representing the proportion of rows to sample. The sampled rows are returned as a new SQLDataModel instance.

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Amount'])

# Example 1: Sample 10 random rows
sample_result = sdm.sample(n_samples=10)

# Create the model
sdm2 = SQLDataModel.from_csv('another_example.csv', headers=['Code', 'Description', 'Price'])

# Example 2: Sample 20% of rows
sample_result2 = sdm2.sample(n_samples=0.2)

Note

  • If the current model’s SQLDataModel.row_count value is less than the sample size, the current row count will be used instead.

set_column_alignment(alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic') None

Sets the default alignment behavior for SQLDataModel when repr or print is called, modifies column_alignment attribute. Default behavior set to 'dynamic', which right-aligns numeric data types, left-aligns all other types, with headers matching value alignment.

Parameters:

alignment (str) – The column alignment setting to use. 'dynamic': Default behavior, dynamically aligns columns based on column data types. 'left': Left-align all column values. 'center': Center-align all column values. 'right': Right-align all column values.

Raises:
  • TypeError – If the argument for alignment is not of type ‘str’.

  • ValueError – If the provided alignment is not one of ‘dynamic’, ‘left’, ‘center’, ‘right’.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Set to right-align columns
sdm.set_column_alignment('right')

# Output
print(sdm)

This will output the model with values right-aligned:

┌───┬────────┬─────────┬────────┬─────────┐
│   │  first │    last │    age │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │   john │   smith │     27 │    1.22 │
│ 1 │  sarah │    west │     39 │    0.70 │
│ 2 │   mike │  harlin │     36 │    3.00 │
│ 3 │    pat │ douglas │     42 │   11.50 │
└───┴────────┴─────────┴────────┴─────────┘

Setting columns to be left-aligned:

# Set to left-align
sdm.set_column_alignment('left')

# Output
print(sdm)

This will output the model with left-aligned values instead:

┌───┬────────┬─────────┬────────┬─────────┐
│   │ first  │ last    │ age    │ service │
├───┼────────┼─────────┼────────┼─────────┤
│ 0 │ john   │ smith   │  27    │  1.22   │
│ 1 │ sarah  │ west    │  39    │  0.70   │
│ 2 │ mike   │ harlin  │  36    │  3.00   │
│ 3 │ pat    │ douglas │  42    │  11.50  │
└───┴────────┴─────────┴────────┴─────────┘

Note

  • Use SQLDataModel.get_column_alignment() to return the current column alignment setting.

  • When using ‘center’, if the column contents cannot be perfectly centralized, the left side will be favored.

  • Use ‘dynamic’ to return to default column alignment, which is right-aligned for numeric types and left-aligned for others.

  • See SQLDataModel.set_table_style() for modifying table format and available styles.

set_column_dtypes(column: str | int, dtype: Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) None

Casts the specified column into the provided python dtype. The datatype must be a valid convertable python datatype to map to an equivalent SQL datatype.

Parameters:
  • column (str or int) – The name or index of the column to be cast, must be current header or within range of current column_count

  • dtype (Literal['bool', 'bytes', 'datetime', 'float', 'int', 'None', 'str']) – The target python data type for the specified column.

Raises:
  • TypeError – If column is not of type ‘str’ or ‘int’.

  • IndexError – If column is an integer and the index is outside of the current model range.

  • ValueError – If column is a string and the column is not found in the current model.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age']
data = [
(0, 'john', 'smith', 27)
,(1, 'sarah', 'west', 29)
,(2, 'mike', 'harlin', 36)
,(3, 'pat', 'douglas', 42)
]

# Create the SQLDataModel object
sdm = SQLDataModel(data, headers)

# Original dtype for comparison
old_dtype = sdm.get_column_dtypes('age')

# Set the data type of the 'age' column to 'float'
sdm.set_column_dtypes('age', 'float')

# Confirm column dtype
new_dtype = sdm.get_column_dtypes('age')

# View result
print(f"Age dtype: {old_dtype} -> {new_dtype}")

This will output:

Age dtype: int -> float

Warning

  • Type casting will coerce any nonconforming values to the dtype being set, this means data will be lost if casted incorrectly.

set_display_color(color: str | tuple)

Sets the table string representation color when SQLDataModel is displayed in the terminal.

Parameters:

color (str or tuple) – Color to set. Accepts hex value (e.g., '#A6D7E8') or tuple of RGB values (e.g., (166, 215, 232)).

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['Name', 'Age', 'Salary'])

# Set color with hex value
sdm.set_display_color('#A6D7E8')

Note

  • By default, no color styling is applied and the native terminal color is used.

  • To use rgb values, ensure a single tuple is provided as an argument.

set_display_float_precision(float_precision: int) None

Sets the current float display precision to the specified value for use in the repr method of the SQLDataModel when representing float data types. Note that this precision limit is overridden by the max_column_width value if the precision limit exceeds the specified maximum width.

Parameters:

float_precision (int) – The desired float display precision to be used for real number values.

Raises:
  • TypeError – If the float_precision argument is not of type ‘int’.

  • ValueError – If the float_precision argument is a negative value, as it must be a valid f-string precision identifier.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age', 'service_time']
data = [
    (0, 'john', 'smith', 27, 1.22)
    ,(1, 'sarah', 'west', 0.7)
    ,(2, 'mike', 'harlin', 3)
    ,(3, 'pat', 'douglas', 11.5)
]

# Create the model with sample data
sdm = SQLDataModel(data,headers)

# Example: Set the float display precision to 2
sdm.set_display_float_precision(2)

# View model
print(sdm)

This will output:

┌───┬────────┬─────────┬────────┬──────────────┐
│   │ first  │ last    │    age │ service_time │
├───┼────────┼─────────┼────────┼──────────────┤
│ 0 │ john   │ smith   │     27 │         2.10 │
│ 1 │ sarah  │ west    │     29 │         0.80 │
│ 2 │ mike   │ harlin  │     36 │         1.30 │
│ 3 │ pat    │ douglas │     42 │         7.02 │
└───┴────────┴─────────┴────────┴──────────────┘
# Get the updated float display precision
updated_precision = sdm.get_display_float_precision()

# Outputs 2
print(updated_precision)

Note

  • The display_float_precision attribute only affects the precision for displaying real or floating point values.

  • The actual precision of the stored value in the model is unaffected by the value set.

set_display_index(display_index: bool) None

Sets the value for SQLDataModel.display_index to enable or disable the inclusion of the SQLDataModel index value in print or repr calls.

Parameters:

display_index (bool) – Whether or not to include the index in SQLDataModel representations.

Raises:

TypeError – If the provided argument is not a boolean value.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Disable displaying index
sdm.set_display_index(False)

Note

set_display_max_rows(rows: int | None) None

Sets value at SQLDataModel.display_max_rows to limit maximum rows displayed when repr or print is called. Use rows = None to derive max number to display from the current terminal height.

Parameters:

rows (int) – The maximum number of rows to display.

Raises:
  • TypeError – If the provided argument is not None or is not an integer.

  • IndexError – If the provided value is an integer less than or equal to 0.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Any call to `print` or `repr` will be restricted to 500 max rows
sdm.set_display_max_rows(500)

# Alternatively, auto-detect dimensions by setting to `None`
sdm.set_display_max_rows(None)

Note

set_headers(new_headers: list[str]) None

Renames the current SQLDataModel headers to values provided in new_headers. Headers must have the same dimensions and match existing headers.

Parameters:

new_headers (list) – A list of new header names. It must have the same dimensions as the existing headers.

Raises:
  • TypeError – If the new_headers type is not a valid type (list or tuple).

  • DimensionError – If the length of new_headers does not match the column count.

  • TypeError – If the type of the first element in new_headers is not a valid type (str, int, or float).

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create model
sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary'])

# Set new headers
sdm.set_headers(['First_Name', 'Last_Name', 'Payment'])
set_max_column_width(width: int) None

Set max_column_width as the maximum number of characters per column when repr or print is called.

Parameters:

width (int) – The maximum width for each column.

Returns:

Sets the max_column_width property.

Return type:

None

Note

  • If max_column_width is set to a value below the current min_column_width property, the maximum width will override the minimum width.

  • The minimum required width is 2, when max_column_width < 2, 2 will be used regardless of the width provided.

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Change the max column width for the table representation
sdm.set_max_column_width(20)
set_min_column_width(width: int) None

Set min_column_width as the minimum number of characters per column when repr or print is called.

Parameters:

width (int) – The minimum width for each column.

Returns:

Sets the min_column_width property.

Return type:

None

Note

  • If min_column_width is set to a value below the current max_column_width property, the maximum width will override the minimum width.

  • The minimum required width is 2, when min_column_width < 2, 2 will be used regardless of the width provided.

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value'])

# Set a new minimum column width value
sdm.set_min_column_width(8)

# Check updated value
print(sdm.get_min_column_width) # 8
set_model_name(new_name: str) None

Sets the new SQLDataModel table name that will be used as an alias for any SQL queries executed by the user or internally.

Parameters:

new_name (str) – The new table name for the SQLDataModel.

Raises:

SQLProgrammingError – If unable to rename the model table due to SQL execution failure.

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2'])

# Rename the model
sdm.set_model_name('custom_table')

Note

  • The provided value must be a valid SQL table name.

  • This alias will be reset to the default value for any new SQLDataModel instances: 'sdm'.

set_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = 'default') None

Sets the table style used for string representations of SQLDataModel.

Parameters:

style (Literal['ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql','round']) – The table styling to set, use 'default' for original style.

Raises:

ValueError – If style provided is not one of the currently supported options ‘ascii’,’bare’,’dash’,’default’,’double’,’list’,’markdown’,’outline’,’pandas’,’polars’,’postgresql’ or ‘round’.

Returns:

None

Examples:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height', 'Birthday']
data = [
    ('Alice', 28, 162.08, '1996-11-20'),
    ('Bobby', 30, 175.36, '1994-06-15'),
    ('Craig', 37, 185.82, '1987-01-07'),
    ('David', 32, 179.75, '1992-12-28')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Lets try the round style
sdm.set_table_style('round')

# View it
print(sdm)

This outputs the 'round' table style:

╭───────┬─────┬─────────┬────────────╮
│ Name  │ Age │  Height │ Birthday   │
├───────┼─────┼─────────┼────────────┤
│ Alice │  28 │  162.08 │ 1996-11-20 │
│ Bobby │  30 │  175.36 │ 1994-06-15 │
│ Craig │  37 │  185.82 │ 1987-01-07 │
│ David │  32 │  179.75 │ 1992-12-28 │
╰───────┴─────┴─────────┴────────────╯

Alternatively, set style = 'ascii' to format SQLDataModel in the ASCII style, the OG of terminal tables:

+-------+-----+---------+------------+
| Name  | Age |  Height | Birthday   |
+-------+-----+---------+------------+
| Alice |  28 |  162.08 | 1996-11-20 |
| Bobby |  30 |  175.36 | 1994-06-15 |
| Craig |  37 |  185.82 | 1987-01-07 |
| David |  32 |  179.75 | 1992-12-28 |
+-------+-----+---------+------------+

Set style = 'bare' to format SQLDataModel in the following style:

Name   Age   Height  Birthday
-------------------------------
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28

Set style = 'dash' to format SQLDataModel with dashes for internal borders:

┌───────┬─────┬─────────┬────────────┐
│ Name  ╎ Age ╎  Height ╎ Birthday   │
├╴╴╴╴╴╴╴┼╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴╴╴╴┤
│ Alice ╎  28 ╎  162.08 ╎ 1996-11-20 │
│ Bobby ╎  30 ╎  175.36 ╎ 1994-06-15 │
│ Craig ╎  37 ╎  185.82 ╎ 1987-01-07 │
│ David ╎  32 ╎  179.75 ╎ 1992-12-28 │
└───────┴─────┴─────────┴────────────┘

Set style = 'default' to format SQLDataModel in the following style, which also happens to be the default styling applied:

┌───────┬─────┬─────────┬────────────┐
│ Name  │ Age │  Height │ Birthday   │
├───────┼─────┼─────────┼────────────┤
│ Alice │  28 │  162.08 │ 1996-11-20 │
│ Bobby │  30 │  175.36 │ 1994-06-15 │
│ Craig │  37 │  185.82 │ 1987-01-07 │
│ David │  32 │  179.75 │ 1992-12-28 │
└───────┴─────┴─────────┴────────────┘
[4 rows x 4 columns]

Set style = 'list' to format SQLDataModel as a list of values, similar to the SQLite CLI representation:

Name   Age   Height  Birthday
-----  ---  -------  ----------
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28

Set style = 'double' to format SQLDataModel using double line borders:

╔═══════╦═════╦═════════╦════════════╗
║ Name  ║ Age ║  Height ║ Birthday   ║
╠═══════╬═════╬═════════╬════════════╣
║ Alice ║  28 ║  162.08 ║ 1996-11-20 ║
║ Bobby ║  30 ║  175.36 ║ 1994-06-15 ║
║ Craig ║  37 ║  185.82 ║ 1987-01-07 ║
║ David ║  32 ║  179.75 ║ 1992-12-28 ║
╚═══════╩═════╩═════════╩════════════╝

Set style = 'markdown' to format SQLDataModel in the Markdown style:

| Name  | Age |  Height | Birthday   |
|-------|-----|---------|------------|
| Alice |  28 |  162.08 | 1996-11-20 |
| Bobby |  30 |  175.36 | 1994-06-15 |
| Craig |  37 |  185.82 | 1987-01-07 |
| David |  32 |  179.75 | 1992-12-28 |

Set style = 'outline' to format SQLDataModel in the following style:

┌─────────────────────────────────┐
│ Name   Age   Height  Birthday   │
├─────────────────────────────────┤
│ Alice   28   162.08  1996-11-20 │
│ Bobby   30   175.36  1994-06-15 │
│ Craig   37   185.82  1987-01-07 │
│ David   32   179.75  1992-12-28 │
└─────────────────────────────────┘

Set style = 'pandas' to format SQLDataModel in the style used by Pandas DataFrames:

Name   Age   Height  Birthday
Alice   28   162.08  1996-11-20
Bobby   30   175.36  1994-06-15
Craig   37   185.82  1987-01-07
David   32   179.75  1992-12-28

Set style = 'polars' to format SQLDataModel in the style used by Polars DataFrames:

┌───────┬─────┬─────────┬────────────┐
│ Name  ┆ Age ┆  Height ┆ Birthday   │
╞═══════╪═════╪═════════╪════════════╡
│ Alice ┆  28 ┆  162.08 ┆ 1996-11-20 │
│ Bobby ┆  30 ┆  175.36 ┆ 1994-06-15 │
│ Craig ┆  37 ┆  185.82 ┆ 1987-01-07 │
│ David ┆  32 ┆  179.75 ┆ 1992-12-28 │
└───────┴─────┴─────────┴────────────┘

Set style = 'postgresql' to format SQLDataModel in the style used by PostgreSQL:

Name  | Age |  Height | Birthday
------+-----+---------+-----------
Alice |  28 |  162.08 | 1996-11-20
Bobby |  30 |  175.36 | 1994-06-15
Craig |  37 |  185.82 | 1987-01-07
David |  32 |  179.75 | 1992-12-28
Change Log:
  • Version 0.3.11 (2024-04-18):
    • Removed 'thick' style and added 'list' style for greater variety of available formats.

Note

  • The labels given to certain styles are entirely subjective and do not in any way express original design or ownership of the styling used.

  • Legacy character sets on older terminals may not support all the character encodings required for some styles.

shape

The current dimensions of the model as a tuple of (rows, columns).

Type:

tuple[int, int]

sort(by: str | int | list = None, asc: bool = True) SQLDataModel

Sort columns in the dataset by the specified ordering. If no value is specified, the current SQLDataModel.sql_idx column is used with the default ordering asc = True.

Parameters:
  • by (str | int | list, optional) – The column or list of columns by which to sort the dataset. Defaults to sorting by the dataset’s index.

  • asc (bool, optional) – If True, sort in ascending order; if False, sort in descending order. Defaults to ascending order.

Raises:
  • TypeError – If value for by argument is not one of type ‘str’, ‘int’ or ‘list’.

  • ValueError – If a specified column in by is not found in the current dataset or is an invalid column.

  • IndexError – If columns are indexed by integer but are outside of the current model range.

Returns:

A new instance of SQLDataModel with columns sorted according to the specified ordering.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

headers = ['first', 'last', 'age', 'service', 'hire_date']
data = [
    ('John', 'Smith', 27, 1.22, '2023-02-01'),
    ('Sarah', 'West', 39, 0.7, '2023-10-01'),
    ('Mike', 'Harlin', 36, 3.9, '2020-08-27'),
    ('Pat', 'Douglas', 42, 11.5, '2015-11-06'),
    ('Kelly', 'Lee', 32, 8.0, '2016-09-18')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Sort by last name column
sorted_sdm = sdm.sort('last')

# View sorted model
print(sorted_sdm)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┐
│   │ first │ last    │  age │ service │ hire_date  │
├───┼───────┼─────────┼──────┼─────────┼────────────┤
│ 0 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │
│ 1 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │
│ 2 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │
│ 3 │ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │
│ 4 │ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │
└───┴───────┴─────────┴──────┴─────────┴────────────┘
[5 rows x 5 columns]

Sort by multiple columns:

# Sort by multiple columns in descending order
sorted_sdm = sdm.sort(['age','hire_date'], asc=False)

# View sorted
print(sorted_sdm)

This will output:

┌───┬───────┬─────────┬──────┬─────────┬────────────┐
│   │ first │ last    │  age │ service │ hire_date  │
├───┼───────┼─────────┼──────┼─────────┼────────────┤
│ 0 │ Pat   │ Douglas │   42 │   11.50 │ 2015-11-06 │
│ 1 │ Sarah │ West    │   39 │    0.70 │ 2023-10-01 │
│ 2 │ Mike  │ Harlin  │   36 │    3.90 │ 2020-08-27 │
│ 3 │ Kelly │ Lee     │   32 │    8.00 │ 2016-09-18 │
│ 4 │ John  │ Smith   │   27 │    1.22 │ 2023-02-01 │
└───┴───────┴─────────┴──────┴─────────┴────────────┘
[5 rows x 5 columns]
Change Log:
  • Version 0.5.1 (2024-05-10):
    • Modified to allow integer indexing for column sort order in by argument.

Note

  • Standard sorting process for sqlite3 is used, whereby the ordering prefers the first column mentioned to the last.

  • Ascending and descending ordering follows this order of operations for multiple columns as well.

sql_db_conn

The in-memory sqlite3 connection object in use by the model.

Type:

sqlite3.Connection

sql_idx

The index column name applied to the sqlite3 in-memory representation of the model. Default is 'idx'

Type:

str

sql_model

The table name applied to the sqlite3 in-memory representation of the model. Default is 'sdm'

Type:

str

static sqlite_cast_type_format(param: str = '?', dtype: Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool'] = 'str', as_binding: bool = True, as_alias: bool = False)

Formats the specified param to be cast consistently into the python type specified for insert params or as a named alias param.

Parameters:
  • param (str) – The parameter to be formatted.

  • dtype (Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool']) – The python data type of the parameter as a string.

  • as_binding (bool, optional) – Whether to format as a binding parameter (default is True).

  • as_alias (bool, optional) – Whether to include an alias for the parameter (default is False).

Returns:

The parameter formatted for SQL type casting.

Return type:

str

Note

  • This function provides consistent formatting for casting parameters into specific data types for SQLite, changing it will lead to unexpected behaviors.

static sqlite_printf_format(column: str, dtype: str, max_pad_width: int, float_precision: int = 4, alignment: str = None) str

Formats SQLite SELECT clauses based on column parameters to provide preformatted fetches, providing most of the formatting for repr output.

Parameters:
  • column (str) – The name of the column.

  • dtype (str) – The data type of the column (‘float’, ‘int’, ‘index’, or other).

  • max_pad_width (int) – The maximum width to pad the output.

  • float_precision (int, optional) – The precision for floating-point numbers (default is 4).

  • alignment (str, optional) – The alignment of the output (‘<’, ‘>’, or None for no alignment).

Returns:

The formatted SELECT clause for SQLite.

Return type:

str

Note

  • This function generates SQLite SELECT clauses for single column only.

  • The output preformats SELECT result to fit repr method for tabular output.

  • The return str is not valid SQL by itself, representing only the single column select portion.

static_py_to_sql_map_dict

The data type mapping to use when converting python types to SQL column types.

Type:

dict

static_sql_to_py_map_dict

The data type mapping to use when converting SQL column types to python types.

Type:

dict

strip(characters: str = None, str_dtype_only: bool = True, inplace: bool = False) SQLDataModel | None

Removes the specified characters from the beginning and end of each value in the current SQLDataModel removing leading and trailing whitespace characters by default.

Parameters:
  • characters (str, optional) – The characters to remove from both ends of the value. Default is None, removing whitespace (' ', '\t', '\n', '\r').

  • str_dtype_only (bool, optional) – If True, only columns with dtype = ‘str’ are stripped, otherwise all columns are stripped. Default is True.

  • inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.

Raises:

TypeError – If characters argument is provided and is not of type 'str' representing unordered characters to remove.

Returns:

If inplace=False, returns a new SQLDataModel with the stripped values. Otherwise modifies the current instance in-place returning None.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create a single item model
sdm = SQLDataModel([[' Hello, World! ']])

# Strip whitespace and print
print(sdm.strip())

This will output the model after stripping the leading and trailing whitespace characters:

┌───┬───────────────┐
│   │ 0             │
├───┼───────────────┤
│ 0 │ Hello, World! │
└───┴───────────────┘
[1 rows x 1 columns]

Non-whitespace characters can also be stripped:

from SQLDataModel import SQLDataModel

headers = ['Col A', 'Col B', 'Col C']
data = [
    ['A1', 'B1', 'C1'],
    ['A2', 'B2', 'C2'],
    ['A3', 'B3', 'C3']
]

# Create the sample model
sdm = SQLDataModel(data, headers)

# Strip leading and trailing 'A' character
sdm_stripped = sdm.strip('A')

# View result
print(sdm_stripped)

This will output a new model where any leading and trailing ‘A’ characters have been removed:

┌───────┬───────┬───────┐
│ Col A │ Col B │ Col C │
├───────┼───────┼───────┤
│ 1     │ B1    │ C1    │
│ 2     │ B2    │ C2    │
│ 3     │ B3    │ C3    │
└───────┴───────┴───────┘
[3 rows x 3 columns]

Multiple characters can be stripped, and the model modified inplace:

# Strip multiple characters and this time modify model inplace
sdm.strip('123', inplace=True)

# View result
print(sdm)

This will output the modified model after stripping leading and trailing ‘123’ characters:

┌───────┬───────┬───────┐
│ Col A │ Col B │ Col C │
├───────┼───────┼───────┤
│ A     │ B     │ C     │
│ A     │ B     │ C     │
│ A     │ B     │ C     │
└───────┴───────┴───────┘
[3 rows x 3 columns]

Note

  • For string replacement instead of string removal, see SQLDataModel.replace().

  • When using str_dtype_only = False, numeric values may be modified due to SQLite’s type affinity rules.

  • This method is equivalent to the SQLite trim(string, character) function, wrapping and passing the equivalent arguments.

table_style

The table style used for string representations of the model. Available styles are 'ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql' or 'round'. Defaults to 'default' table style.

Type:

str

tail(n_rows: int = 5) SQLDataModel

Returns the last n_rows of the current SQLDataModel.

Parameters:

n_rows (int, optional) – Number of rows to return. Defaults to 5.

Raises:

TypeError – If n_rows argument is not of type ‘int’ representing the number of rows to return from the tail of the model.

Returns:

A new SQLDataModel instance containing the specified number of rows.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Countries data available for sample dataset
url = 'https://developers.google.com/public-data/docs/canonical/countries_csv'

# Create the model
sdm = SQLDataModel.from_html(url)

# Get tail of model
sdm_tail = sdm.tail()

# View it
print(sdm_tail)

This will grab the bottom 5 rows by default:

┌─────┬─────────┬──────────┬───────────┬───────────────────┐
│     │ country │ latitude │ longitude │ name              │
├─────┼─────────┼──────────┼───────────┼───────────────────┤
│ 240 │ WF      │ -13.7688 │ -177.1561 │ Wallis and Futuna │
│ 241 │ EH      │  24.2155 │  -12.8858 │ Western Sahara    │
│ 242 │ YE      │  15.5527 │   48.5164 │ Yemen             │
│ 243 │ ZM      │ -13.1339 │   27.8493 │ Zambia            │
│ 244 │ ZW      │ -19.0154 │   29.1549 │ Zimbabwe          │
└─────┴─────────┴──────────┴───────────┴───────────────────┘
[5 rows x 4 columns]

Note

  • See related SQLDataModel.head() for the opposite, grabbing the top n_rows from the current model.

to_csv(filename: str = None, delimiter: str = ',', quotechar: str = '"', lineterminator: str = '\r\n', na_rep: str = 'None', encoding: str = 'utf-8', index: bool = False, **kwargs) str | None

Writes SQLDataModel to the specified file if filename argument if provided, otherwise returns the model directly as a CSV formatted string literal.

Parameters:
  • filename (str) – The name of the CSV file to which the data will be written. Default is None, returning as raw literal.

  • delimiter (str, optional) – The delimiter to use for separating values. Default is ‘,’.

  • quotechar (str, optional) – The character used to quote fields. Default is ‘”’.

  • lineterminator (str, optional) – The character used to terminate the row and move to a new line. Default is ‘rn’.

  • na_rep (str, optional) – String representation to use for null or missing values. Default is ‘None’.

  • encoding (str, optional) – The encoding to use when writing the model to a CSV file. Default is ‘utf-8’.

  • index (bool, optional) – If True, includes the index in the CSV file; if False, excludes the index. Default is False.

  • **kwargs – Additional arguments to be passed to the csv.writer constructor.

Returns:

If filename is None, returns the model as a delimited string literal, None if filename is provided, writing the model to the specified file as a CSV file.

Return type:

str | None

Example:

Returning CSV Literal
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Generate the literal using tab delimiter
csv_literal = sdm.to_csv(delimiter='\t')

# View output
print(csv_literal)

This will output:

Name    Age     Height
John    30      175.3
Alice   28      162.0
Travis  35      185.8
Write to File
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# CSV filename
csv_file = 'persons.csv'

# Write to the file, keeping the index
sdm.to_csv(filename=csv_file, index=True)

Contents of persons.csv:

idx,Name,Age,Height
0,John,30,175.3
1,Alice,28,162.0
2,Travis,35,185.8
Change Log:
  • Version 0.6.4 (2024-05-17):
    • Added encoding parameter to pass to file handler when writing contents as CSV file and set default to utf-8 to align with expected SQLite codec.

  • Version 0.4.0 (2024-04-23):
    • Modified quoting behavior to avoid redundant quoting and to closely mimic csv module from standard library.

    • Added na_rep to fill null or missing values when generating output, useful for space delimited data and minimal quoting.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • When index=True, the sdm_index property determines the column name of the index in the result.

  • Modifying delimiter affects how the data is delimited when writing to filename and when returning as raw literal, any valid delimiter can be used.

  • Quoting behavior can be modified by providing an additional keywork arg such as quoting=1 to wrap all values in quotes, or quoting=2 to quote only non-numeric values, see csv.QUOTE_X enums for all options.

  • Use SQLDataModel.to_text() to pretty print table in specified style for visualizing output if strict delimiting is unnecessary.

  • See SQLDataModel.from_csv() for creating a new SQLDataModel from existing CSV data

to_dict(orient: Literal['rows', 'columns', 'list'] = 'rows', index: bool = None) dict | list[dict]

Converts the SQLDataModel instance to a dictionary or a list of dictionaries based on the specified orientation.

Parameters:
  • orient (Literal["rows", "columns", "list"]) – The orientation of the output, see examples for more detail. "rows": Returns a dictionary with index values as keys and row values as values. "columns": Returns a dictionary with column names as keys and column values as tuples. "list": Returns a list of dictionaries, where each dictionary represents a row.

  • index (bool) – Whether to include the index column in the output. Defaults to the display_index property.

Raises:

ValueError – if value for orient is not one of “rows”, “columns” or “list”.

Returns:

The converted data structure based on the specified orientation.

Return type:

dict | list[dict]

Examples:

Orient by Rows
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Col A','Col B', 'Col C']
data = [
    ['A,0', 'A,1', 'A,2'],
    ['B,0', 'B,1', 'B,2'],
    ['C,0', 'C,1', 'C,2']
]

# Create the model
sdm = SQLDataModel(data, headers)

# Convert to dictionary with rows as keys and values
rows_dict = sdm.to_dict(orient="rows")

# View output
for k, v in rows_dict.items():
    print(f"{k}: {v}")

This will output:

0: ('A,0', 'B,0', 'C,0')
1: ('A,1', 'B,1', 'C,1')
2: ('A,2', 'B,2', 'C,2')
Orient by Columns
# Convert to dictionary with columns as keys and rows as values
columns_dict = sdm.to_dict(orient="columns")

# View output
for k, v in columns_dict.items():
    print(f"{k}: {v}")

This will output:

Col A: ('A,0', 'A,1', 'A,2')
Col B: ('B,0', 'B,1', 'B,2')
Col C: ('C,0', 'C,1', 'C,2')
Orient by List
# Convert to list of dictionaries with each dictionary representing a row with columns as keys
list_dict = sdm.to_dict(orient="list")

# View output
for row in list_dict:
    print(row)

This will output:

{'Col A': 'A,0', 'Col B': 'B,0', 'Col C': 'C,0'}
{'Col A': 'A,1', 'Col B': 'B,1', 'Col C': 'C,1'}
{'Col A': 'A,2', 'Col B': 'B,2', 'Col C': 'C,2'}
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • Use index to return index data, otherwise current instance display_index value will be used.

  • For 'list' orientation, data returned is JSON-like in structure, where each row has its own “column”: “value” data.

to_excel(filename: str, worksheet: int | str = 0, index: bool = False) None

Writes the current SQLDataModel to the specified Excel filename.

Parameters:
  • filename (str) – The file path to save the Excel file, e.g., filename = 'output.xlsx'.

  • worksheet (int | str, optional) – The index or name of the worksheet to write to. Defaults to 0, indicating the first worksheet.

  • index (bool, optional) – If SQLDataModel index should be included in the output. Default is False.

Raises:
  • ModuleNotFoundError – If the required package openpyxl is not installed as determined by _has_xl flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid Excel file path to create or write to.

  • IndexError – If worksheet is provided as type ‘int’ but is out of range of the available worksheets.

  • Exception – If any unexpected exception occurs during the Excel writing and saving process.

Returns:

If successful, a new Excel file filename is created and None is returned.

Return type:

None

Example:

import openpyxl
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Rate', 'Gender']
data = [
    ('Alice', 25, 26.50, 'Female'),
    ('Bob', 30, 21.25, 'Male'),
    ('Will', 35, 24.00, 'Male'),
    ('Mary', 32, 23.75, 'Female')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Export into a new Excel file
sdm.to_excel('Team-Overview.xlsx')

# Or append to existing Excel file as a new worksheet
sdm.to_excel('Team.xlsx', worksheet='Demographics')

This will create a new Excel file Team-Overview.xlsx:

    ┌───────┬──────┬────────┬────────┐
    │ A     │  B   │ C      │ D      │
┌───┼───────┼──────┼────────┼────────┤
│ 1 │ Name  │  Age │ Gender │   Rate │
│ 2 │ Alice │   25 │ Female │  26.50 │
│ 3 │ Mary  │   32 │ Female │  23.75 │
│ 4 │ Bobby │   30 │ Male   │  21.25 │
│ 5 │ Will  │   35 │ Male   │  24.00 │
└───┴───────┴──────┴────────┴────────┘
[ Sheet1 ]
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • If provided filename does not exist, it will be created. If it already exists, model data will be appended to the existing worksheet contents.

  • When providing a string argument for worksheet, if the sheet does not exist, it will be created. However if providing an integer index for an out of range sheet, an IndexError will be raised.

  • See related SQLDataModel.from_excel() for creating a SQLDataModel from existing Excel content.

to_html(filename: str = None, index: bool = None, encoding: str = 'utf-8', style_params: dict = None) str

Returns the current SQLDataModel as a lightly formatted HTML <table> element as a string if filename is None. If filename is specified, writes the HTML to the specified file as .html and returns None.

Parameters:
  • filename (str) – The file path to save the HTML content. If None, returns the HTML as a string (default is None).

  • index (bool) – Whether to include the index column in the HTML table (default is current display_index).

  • encoding (str) – Character encoding to use when writing model to HTML file, default set to 'utf-8'.

  • style_params (dict) – A dictionary representing CSS styles {property: value} to customize the appearance of the HTML table (default is None).

Raises:
  • TypeError – If filename is not a valid string when specified or if style_params is not a dictionary when specified.

  • OSError – If encountered while trying to open and write the HTML to the file.

Returns:

If filename is None, returns the HTML content as a string. If filename is specified, writes to the file and returns None.

Return type:

str | None

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel(data=[(1, 'John'), (2, 'Doe')], headers=['ID', 'Name'])

# Create and save as new html file
sdm.to_html('output.html', style_params={'font-size': '12pt'})

# Get HTML as a string
html_string = sdm.to_html()

# View output
print(html_string)

This will output:

<table>
    <tr>
        <th>ID</th>
        <th>Name</th>
    </tr>
    <tr>
        <td>1</td>
        <td>John</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Doe</td>
    </tr>
</table>
<style>
    table {font:size: 12pt;}
</style>
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • Base styles are applied to reflect the styling of SQLDataModel in the terminal, including any display_color which is applied to the table CSS.

  • Table index is determined by the instance display_index attribute unless specified in the argument of the same name, overriding the instance attribute.

  • The default background-color is #E5E5E5, and the default font color is #090909, with 1 px solid border to mimic the repr for the instance.

to_json(filename: str = None, index: bool = None, **kwargs) list | None

Converts the SQLDataModel instance to JSON format. If filename is specified, the JSON is written to the file; otherwise, a JSON-like object is returned.

Parameters:
  • filename (str) – The path to the file where JSON will be written. If None, no file is created and JSON-like object is returned.

  • index (bool) – Whether to include the index column in the JSON. Defaults to the display_index property.

  • **kwargs – Additional keyword arguments to pass to the json.dump() method.

Raises:
  • TypeError – If filename is not of type ‘str’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

Returns:

If filename is None, a list containing a JSON-like object is returned. Otherwise JSON file created and returns None.

Return type:

list | None

Examples:

To JSON Literal
from SQLDataModel import SQLDataModel

# Sample JSON to first create model
json_source = [
    {"id": 1, "color": "red", "value": "#f00", "notes": "primary"}
    ,{"id": 2, "color": "green", "value": "#0f0", "notes": None}
    ,{"id": 3, "color": "blue", "value": "#00f", "notes": "primary"}
]

# Create the model
sdm = SQLDataModel.from_json(json_source)

# View current state
print(sdm)

This will output:

┌─────┬───────┬───────┬─────────┐
│  id │ color │ value │ notes   │
├─────┼───────┼───────┼─────────┤
│   1 │ red   │ #f00  │ primary │
│   2 │ green │ #0f0  │         │
│   3 │ blue  │ #00f  │ primary │
└─────┴───────┴───────┴─────────┘
[3 rows x 4 columns]
Write JSON File
# Write model to JSON file
sdm.to_json('output.json')

# Or convert to JSON-like object
json_data = sdm.to_json()

# View JSON object
print(json_data)

This will output:

[{
    "id": 1,
    "color": "red",
    "value": "#f00",
    "notes": "primary"
},
{
    "id": 2,
    "color": "green",
    "value": "#0f0",
    "notes": null
},
{
    "id": 3,
    "color": "blue",
    "value": "#00f",
    "notes": "primary"
}]
Change Log:
  • Version 0.3.2 (2024-04-02):
    • Changed return object to JSON string literal when filename=None to convert to valid literal object.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • When no filename is specified, JSON-like object will be returned as a rowwise array.

  • Any nested structure will be flattened by this method as well as the SQLDataModel.from_json() method.

to_latex(filename: str = None, index: bool = False, bold_headers: bool = False, min_column_width: int = None, max_column_width: int = None, format_output_as: Literal['table', 'document'] = 'table', column_alignment: Literal['left', 'center', 'right', 'dynamic'] = None) str | None

Returns the current SQLDataModel as a LaTeX table string if filename is None, otherwise writes the table to the provided file as a LaTeX document.

Parameters:
  • filename (str, optional) – The name of the file to write the LaTeX content. If not provided, the LaTeX content is returned as a string. Default is None.

  • index (bool, optional) – Whether to include the index column in the LaTeX output. Default is False.

  • bold_headers (bool, optional) – Whether the headers should be bolded in the LaTeX table. Default is False.

  • min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on attribute SQLDataModel.min_column_width.

  • max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on attribute SQLDataModel.max_column_width.

  • format_output_as (Literal['table', 'document']), optional) – Whether the output should be formatted as a LaTeX table or as a standalone document. Default is ‘table’.

  • column_alignment (Literal['left', 'center', 'right', 'dynamic'], optional) – The alignment for table columns. Default is current value set on attribute SQLDataModel.column_alignment.

Returns:

If filename is None, returns the LaTeX table as a string. None: If filename is provided, writes the LaTeX table to the specified file and returns None.

Return type:

str

Raises:
  • TypeError – If the filename argument is not of type ‘str’, index argument is not of type ‘bool’, min_column_width or max_column_width argument is not of type ‘int’.

  • ValueError – If format_output_as is not one of ‘table’, ‘document’, or column_alignment provided and is not one of ‘left’, ‘center’, ‘right’, ‘dynamic’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

LaTeX Formatting:
  • LaTeX output format that is generated can be set by format_output_as which provides one of two formats:

    • 'table': Output formatted as insertable table, beginning and ending with LaTeX \begin{table} and \end{table} respectively.

    • 'document': Output formatted as standalone document, beginning and ending with LaTeX \begin{document} and \end{document} respectively.

  • LaTeX table alignment will follow the SQLDataModel instance alignment, set by SQLDataModel.set_column_alignment():

    • 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.

    • 'left': Left-aligns all column content, equivalent to LaTeX column format: |l|.

    • 'center': Center-aligns all column content preferring left on uneven splits, equivalent to LaTeX column format: |c|.

    • 'right': Right-aligns all column content, equivalent to LaTeX column format: |r|.

  • The LaTeX rows generated will use dynamic alignment regardless of column_alignment provided, this will not affect the rendered alignment but will maintain consistent format without affecting the actual alignment rendered by LaTeX.

Examples:

Returning LaTeX Literal
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data=data, headers=headers)

# Generate LaTeX table literal
latex_output = sdm.to_latex()

# View LaTeX output
print(latex_output)

This will output:

\begin{tabular}{|l|r|r|}
\hline
    {Name} & {Age} & {Height} \
\hline
    John    &   30 &  175.30 \
    Alice   &   28 &  162.00 \
    Michael &   35 &  185.80 \
\hline
\end{tabular}
Write to LaTeX File
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data=data, headers=headers)

# Write the output to the file, formatting the output as a proper LaTeX document
latex_table = sdm.to_latex(filename='Table.tex', format_output_as='document')

Contents of file Table.tex:

\documentclass{article}
\begin{document}
\begin{table}[h]
\centering
\begin{tabular}{|l|r|r|}
\hline
    {Name} & {Age} & {Height} \
\hline
    John    &   30 &  175.30 \
    Alice   &   28 &  162.00 \
    Michael &   35 &  185.80 \
\hline
\end{tabular}
\end{table}
\end{document}
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • A \centering command is included in the LaTeX output by default regardless of alignments specified.

  • LaTeX headers and rows are indented by four spaces to keep with conventional table syntax and to distinguish the table data from commands.

to_list(index: bool = False, include_headers: bool = False) list

Returns the current SQLDataModel data as a 1-dimensional list of values if data dimensions are compatible with flattening, or as a list of lists if data is 2-dimensional. Data is returned without index or headers by default, use index = True or include_headers = True to modify.

Parameters:
  • index (bool, optional) – If True, includes the index in the result, if False, excludes the index. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result, if False, excludes headers. Default is False.

Returns:

The flattened list of values corresponding to the model data.

Return type:

list

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('Beth', 27, 172.4),
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Get all model data as a list of lists
model_data = sdm.to_list()

# Iterate over each row
for row in model_data:
    print(row)

This will output:

['Beth', 27, 172.4]
['John', 30, 175.3]
['Alice', 28, 162.0]
['Travis', 35, 185.8]

Data will be flattened into a single dimension if possible, such as when accessing individual columns:

# Get 'Name' column as a list
col_data = sdm['Name'].to_list()

# View output
print(col_data)

This will output a list containing the values from each row for the column:

['Beth', 'John', 'Alice', 'Travis']

Data will also be flattened when accessing individual rows:

# Get first row as a list with index
row_data = sdm[0].to_list(index=True)

# View result
print(row_data)

This will output the row’s values including the index:

[0, 'Beth', 27, 172.4]
Change Log:
  • Version 0.5.0 (2024-05-09):
    • Modified behavior to output 1-dimensional list when possible and a list of lists when not possible.

    • Changed default to index = False to increase surface for 1-dimensional flattening.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

to_local_db(filename: str) None

Writes the SQLDataModel in-memory database to disk as a SQLite database file using the specified filename.

Parameters:

filename (str) – The filename or filepath to use when writing the model to disk.

Raises:
  • TypeError – If filename is provided and is not of type ‘str’ representing a valid sqlite database save path.

  • sqlite3.Error – If there is an issue with the SQLite database operations during backup.

Returns:

None

Example:

import sqlite3
from SQLDataModel import SQLDataModel

# Sample data
data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')]

# Create the model
sdm = SQLDataModel(data, headers=['Name','Age','Sex'])

# Filename to use for database
db_file = "model.db"

# Write the in-memory database model to disk
sdm.to_local_db(db_file)

# Loading the model back from disk can now be done at anytime
sdm = SQLDataModel.from_sql("sdm", sqlite3.connect(db_file))

# View restored model
print(sdm)

This will output the model we originally created:

┌───┬───────┬─────┬─────┐
│   │ Name  │ Age │ Sex │
├───┼───────┼─────┼─────┤
│ 0 │ Alice │  20 │ F   │
│ 1 │ Billy │  25 │ M   │
│ 2 │ Chris │  30 │ M   │
└───┴───────┴─────┴─────┘
[3 rows x 3 columns]
Change Log:
  • Version 0.5.2 (2024-05-13):
    • Renamed db parameter to filename for package consistency and to avoid confusion between similarily named database objects.

    • Changed filename from keyword to positional argument making it a required parameter to avoid accidental overwriting.

Note

to_markdown(filename: str = None, index: bool = False, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None) str | None

Returns the current SQLDataModel as a markdown table literal if filename is None, otherwise writes the table to the provided file as markdown.

Parameters:
  • filename (str, optional) – The name of the file to write the Markdown content. If not provided, the Markdown content is returned as a string. Default is None.

  • index (bool, optional) – Whether to include the index column in the Markdown output. Default is False.

  • min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on SQLDataModel.min_column_width.

  • max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on SQLDataModel.max_column_width.

  • float_precision (int, optional) – The precision for floating-point values. Default is current value set on SQLDataModel.display_float_precision.

  • column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – The alignment for table columns. Default is current value set on SQLDataModel.column_alignment. 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types. 'left': Left-aligns all column content. 'center': Center-aligns all column content preferring left on uneven splits. 'right': Right-aligns all column content.

Raises:
  • TypeError – If the filename argument is not of type ‘str’, index argument is not of type ‘bool’, min_column_width or max_column_width argument is not of type ‘int’.

  • ValueError – If the column_alignment argument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

Returns:

If filename is None, returns the Markdown table as a string, if filename is provided, writes the Markdown table to the specified file and returns None.

Return type:

str or None

Column Alignment:
  • 'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.

  • 'left': Left-aligns all column content.

  • 'center': Center-aligns all column content preferring left on uneven splits.

  • 'right': Right-aligns all column content.

Examples:

To Markdown Literal
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data=data, headers=headers)

# Generate markdown table literal
markdown_table = sdm.to_markdown()

# View markdown output
print(markdown_table)

This will output:

| Name    |  Age |  Height |
|:--------|-----:|--------:|
| John    |   30 |  175.30 |
| Alice   |   28 |  162.00 |
| Michael |   35 |  185.80 |
Write to Markdown File
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data=data, headers=headers)

# Write the output to the file, center-aligning all columns
sdm.to_markdown(filename='Table.MD', column_alignment='center')

Contents of Table.MD:

| Name    |  Age |  Height |
|:--------|-----:|--------:|
| John    |   30 |  175.30 |
| Alice   |   28 |  162.00 |
| Michael |   35 |  185.80 |
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

to_numpy(index: bool = False, include_headers: bool = False) ndarray

Converts SQLDataModel to a NumPy ndarray object of shape (rows, columns). Note that the numpy package must be installed to use this method.

Parameters:
  • index (bool, optional) – If True, includes the model index in the result. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result. Default is False.

Raises:

ModuleNotFoundError – If NumPy is not installed.

Returns:

The model’s data converted into a NumPy array.

Return type:

numpy.ndarray

Example:

import numpy
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the numpy array with default parameters, no indicies or headers
result_array = sdm.to_numpy()

# View array
print(result_array)

This will output:

[['John' '30' '175.3']
 ['Alice' '28' '162.0']
 ['Travis' '35' '185.8']]

Model headers can also be retained:

# Create the numpy array with with indicies and headers
result_array = sdm.to_numpy(index=True, include_headers=True)

# View array
print(result_array)

This will output:

[['idx' 'Name' 'Age' 'Height']
 ['0' 'John' '30' '175.3']
 ['1' 'Alice' '28' '162.0']
 ['2' 'Travis' '35' '185.8']]
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • Output will always be a 2-dimensional array of type numpy.ndarray

to_pandas(index: bool = False, include_headers: bool = True) DataFrame

Converts SQLDataModel to a Pandas DataFrame object. Note that the pandas package must be installed to use this method.

Parameters:
  • index (bool, optional) – If True, includes the model index in the result. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result. Default is True.

Raises:

ModuleNotFoundError – If Pandas is not installed.

Returns:

The model’s data converted to a Pandas DataFrame.

Return type:

pandas.DataFrame

Example:

import pandas
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Convert the model to a pandas df
df = sdm.to_pandas(include_headers=True, index=True)

# View result
print(df)

This will output:

    Name  Age  Height
0    John   30   175.3
1   Alice   28   162.0
2  Travis   35   185.8
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

to_parquet(filename: str, **kwargs) None

Writes the current SQLDataModel to the specified parquet filename.

Parameters:
  • filename (str) – The file path to save the parquet file, e.g., filename = 'user/data/output.parquet'.

  • **kwargs – Additional keyword arguments to pass to the pyarrow write_table function.

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed as determined by _has_pa flag.

  • TypeError – If the filename argument is not of type ‘str’ representing a valid parquet file path.

  • Exception – If any unexpected exception occurs during the parquet writing process.

Returns:

If successful, a new parquet file filename is created and None is returned.

Return type:

None

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Rate']
data = [('Alice', 25, 26.50), ('Bob', 30, 21.25), ('Will', 35, 24.00)]

# Create the model
sdm = SQLDataModel(data,headers, display_index=False)

# Parquet file
pq_file = "output.parquet"

# Write the model as parquet file
sdm.to_parquet(pq_file)

# Confirm result by reading back file
sdm_result = SQLDataModel.from_parquet(pq_file)

# View model
print(sdm_result)

This will output:

┌───────┬──────┬────────┐
│ Name  │  Age │   Rate │
├───────┼──────┼────────┤
│ Alice │   25 │  26.50 │
│ Bob   │   30 │  21.25 │
│ Will  │   35 │  24.00 │
└───────┴──────┴────────┘
[3 rows x 3 columns]

Note

  • The pyarrow package is required to use this method as well as the SQLDataModel.from_parquet() method.

  • The SQLDataModel.to_dict() method is used prior to writing to parquet to convert the SQLDataModel into a dictionary suitable for parquet Table format.

  • Exceptions raised by the pyarrow package and its methods are caught and reraised when encountered to keep with package error formatting.

to_pickle(filename: str = None) None

Save the SQLDataModel instance to the specified filename as a pickle object.

Parameters:

filename (str, optional) – The file name to save the model to. If None, the invoking Python file’s name with a “.sdm” extension will be used.

Raises:

TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the SQLDataModel object
sdm = SQLDataModel(data, headers)

# Save the model's data as a pickle file "output.sdm"
sdm.to_pickle("output.sdm")

# Alternatively, leave blank to use the current file's name:
sdm.to_pickle()

# This way the same data can be recreated later by calling the from_pickle() method from the same project:
sdm = SQLDataModel.from_pickle()

Note

  • All data, headers, data types and display properties will be saved when pickling.

  • If no filename argument is provided, then the invoking module’s __name__ property will be used by default.

to_polars(index: bool = False, include_headers: bool = True) _pl.DataFrame

Converts SQLDataModel to a Polars DataFrame object. Note that the polars package must be installed to use this method.

Parameters:
  • index (bool, optional) – If True, includes the model index in the result. Default is False.

  • include_headers (bool, optional) – If True, includes column headers in the result. Default is True.

Raises:

ModuleNotFoundError – If Polars is not installed.

Returns:

The model’s data converted to a Polars DataFrame.

Return type:

polars.DataFrame

Example:

import polars
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('Beth', 27, 172.4),
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Travis', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data, headers)

# Convert the model to a polars df with the index
df = sdm.to_polars(index=True)

# View result
print(df)

This will output:

shape: (4, 4)
┌─────┬────────┬─────┬────────┐
│ idx ┆ Name   ┆ Age ┆ Height │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ i64 ┆ str    ┆ i64 ┆ f64    │
╞═════╪════════╪═════╪════════╡
│ 0   ┆ Beth   ┆ 27  ┆ 172.4  │
│ 1   ┆ John   ┆ 30  ┆ 175.3  │
│ 2   ┆ Alice  ┆ 28  ┆ 162.0  │
│ 3   ┆ Travis ┆ 35  ┆ 185.8  │
└─────┴────────┴─────┴────────┘

Note

  • See related SQLDataModel.from_polars() for the inverse method of converting a Polars DataFrame object into to a SQLDataModel.

  • SQLDataModel uses different data types than those used in polars, see SQLDataModel.set_column_dtypes() for more information about casting rules.

  • Polars does not really have a concept of an index column, therefore when using index=True, the SQLDataModel index is just an additional column in the returned DataFrame object.

to_pyarrow(index: bool = False) Table

Returns the current SQLDataModel in Apache Arrow columnar format as a pyarrow.Table.

Parameters:

index (bool, optional) – Specifies whether to include the index of the SQLDataModel in the resulting Table. Default is to False.

Raises:
  • ModuleNotFoundError – If the required package pyarrow is not installed.

  • Exception – If any unexpected exception occurs during the pyarrow conversion process.

Returns:

A table representing the current SQLDataModel in Apache Arrow columnar format.

Return type:

pyarrow.Table

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)]

# Create the model
sdm = SQLDataModel(data, headers)

# Create the pyarrow table
table = sdm.to_pyarrow()

# View result
print(table)

This will output the pyarrow object details:

pyarrow.Table
Name: string
Age: int64
Grade: double
----
Name: [["Alice","Bob","Charlie"]]
Age: [[25,30,35]]
Grade: [[3.8,3.9,3.2]]
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

  • Unmodified python types will follow conversion and casting rules specified in pyarrow implementation, for the modified date and datetime types, date32[day] and timestamp[us] will be used, respectively.

to_sql(table: str, con: Connection | Any, *, schema: str = None, if_exists: Literal['fail', 'replace', 'append'] = 'fail', index: bool = True, primary_key: str | int = None) None

Insert the SQLDataModel into the specified table using the provided database connection.

Supported Connection APIs:
  • SQLite using sqlite3

  • PostgreSQL using psycopg2

  • SQL Server ODBC using pyodbc

  • Oracle using cx_Oracle

  • Teradata using teradatasql

Parameters:
  • table (str) – The name of the table where data will be inserted.

  • con (sqlite3.Connection | Any) – The database connection object. Supported connection APIs are sqlite3, psycopg2, pyodbc, cx_Oracle, teradatasql

  • schema (str, optional) – The schema to use for PostgreSQL and ODBC SQL Server connections, ignored otherwise. Default is None.

  • if_exists (Literal['fail', 'replace', 'append'], optional) – Action to take if the table already exists. If fail an error is raised if table exists and no inserts occur. If replace any existing table is dropped prior to inserts. If append existing table is appended to by subsequent inserts.

  • index (bool, optional) – If the model index should be included in the target table. Default is True.

  • primary_key (str | int, optional) – Column name or index to use as table primary key. Default is None, using the index column as the primary key when index=True.

Raises:
  • SQLProgrammingError – If an error occurs during cursor accessing, table creation or data insertion into the database.

  • ValueError – If specified table already exists when using if_exists='fail' or if con is not one of the currently supported connection modules.

  • IndexError – If primary_key is provided as an int representing a column index but is out of range of the current model SQLDataModel.column_count.

  • TypeError – If primary_key argument provided is not of type ‘str’ or ‘int’ representing a valid column name or index to use as the primary key column for the target table.

Returns:

None

Example:

import sqlite3
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)]

# Create the model
sdm = SQLDataModel(data, headers)

# Create connection object
sqlite_db_conn = sqlite3.connect('students.db')

# Basic usage, creating a new table
sdm.to_sql('users', sqlite_db_conn)

This will create a new table users, or fail if one already exists:

sqlite> select * from users;

idx  Name     Age  Grade
---  -------  ---  -----
0    Alice    25   3.8
1    Bob      30   3.9
2    Charlie  35   3.2
3    David    28   3.4

Connect to PostgreSQL, SQL Server, Oracle or Teradata:

import psycopg2
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Grade']
data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)]

# Create the model
sdm = SQLDataModel(data, headers)

# Setup the connection, whether using psycopg2 or other supported modules like pyodbc
con = psycopg2.connect(...)

# Create or replace existing table in database
sdm.to_sql('users', con, if_exists='replace', index=False)

This will result in a new table users in our PostgreSQL database:

=> select * from users;

Name    | Age | Grade |
--------+-----+-------+
Alice   |  25 |   3.8 |
Bob     |  30 |   3.9 |
Charlie |  35 |   3.2 |
David   |  28 |   3.4 |

For SQL Server connections using pyodbc, the example would be almost identical except for which con object we use:

import pyodbc

# For SQL Server ODBC connections using pyodbc
con = pyodbc.connect(...)

The same is true for Oracle and other connections:

import cx_Oracle

# For Oracle connections using cx_Oracle
con = cx_Oracle.connect(...)

Using a Primary Key

from SQLDataModel import SQLDataModel

# Sample data
headers = ['ID', 'User']
data = [(1001, 'Alice'), (1002, 'Bob'), (1003, 'Charlie'), (1004, 'David')]

# Create the model
sdm = SQLDataModel(data, headers)

# Create connection object
sqlite_db_conn = sqlite3.connect('students.db')

# Create the table using the 'ID' column as the primary key
sdm.to_sql('users', sqlite_db_conn, if_exists='replace', index=False, primary_key='ID')

This will create a users table with the schema:

sqlite> .schema users

CREATE TABLE "users" ( "ID" INTEGER PRIMARY KEY,  "User" TEXT);

With the ID column as its primary key:

sqlite> select * from users;

ID    User
----  -------
1001  Alice
1002  Bob
1003  Charlie
1004  David

If table creation is necessary, column types will be mapped according to the destination database by the following conversion:

┌─────────────────┬─────────┬─────────┬────────┬─────────┬────────────────┬──────┬───────────┐
│ Database \ Type │ NULL    │ INTEGER │ REAL   │ TEXT    │ BLOB           │ DATE │ TIMESTAMP │
├─────────────────┼─────────┼─────────┼────────┼─────────┼────────────────┼──────┼───────────┤
│ PostgreSQL      │ UNKNOWN │ INTEGER │ FLOAT  │ TEXT    │ BYTEA          │ DATE │ TIMESTAMP │
│ SQL Server ODBC │ UNKNOWN │ INTEGER │ FLOAT  │ TEXT    │ VARBINARY(MAX) │ DATE │ DATETIME  │
│ Oracle          │ UNKNOWN │ NUMBER  │ NUMBER │ VARCHAR │ BLOB           │ DATE │ DATETIME  │
│ Teradata        │ UNKNOWN │ INTEGER │ FLOAT  │ VARCHAR │ BYTE           │ DATE │ DATETIME  │
│ SQLite          │ NULL    │ INTEGER │ REAL   │ TEXT    │ BLOB           │ DATE │ TIMESTAMP │
└─────────────────┴─────────┴─────────┴────────┴─────────┴────────────────┴──────┴───────────┘
[5 rows x 8 columns]
Change Log:
  • Version 0.3.0 (2024-03-31):
    • Renamed arguments extern_con: con, replace_existing: if_exists, include_index: index.

    • Added primary_key argument for specifying a primary key column for table schema.

    • Added schema argument for specifying a target schema for the table.

Note

  • When providing a primary_key column it will be assumed unique and the model will not perform any unique-ness constraints.

  • Any con or connection object provided is assumed to be open and valid, if a cursor cannot be created from the object an exception will be raised.

  • Connections with write access can be used in the SQLDataModel.to_sql() method for writing to the same connection types, be careful.

  • ValueError will be raised if table already exists, use if_exists = 'replace' or if_exists = 'append' to instead replace or append to the table.

  • See relevant module documentation for additional details or information pertaining to specific database or connection dialect being used.

  • See related SQLDataModel.from_sql() for creating SQLDataModel from existing SQL database connections.

to_text(filename: str = None, index: bool = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = None, display_dimensions: bool = False) str | None

Returns a textual representation of the current SQLDataModel as a string literal or by writing to file if a filename is provided.

Parameters:
  • filename (str, optional) – The name of the file to write the text content. If provided, writes the text to the specified file. Default is None.

  • index (bool, optional) – Whether to include the index column in the text output. Default is value set on SQLDataModel.display_index.

  • min_column_width (int, optional) – The minimum column width for table cells. Default is value set on SQLDataModel.min_column_width.

  • max_column_width (int, optional) – The maximum column width for table cells. Default is value set on SQLDataModel.max_column_width.

  • float_precision (int, optional) – The precision for floating-point values. Default is value set on SQLDataModel.display_float_precision.

  • column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – The alignment for table columns. Default is value set on SQLDataModel.column_alignment. Use 'dynamic' dynamically aligns column content, right for numeric types and left for remaining types. Use 'left' left-aligns all column content. Use 'center' center-aligns all column content. Use 'right' right-aligns all column content.

  • table_style (Literal['ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql','round'], optional) – The table styling to use. Default is value set on SQLDataModel.table_style.

  • display_dimensions (bool, optional) – Whether to include the model dimensions [N rows x N cols] in the text output. Default is False.

Raises:
  • TypeError – If arguments are provided but are not the correct types: filename (str), index (bool), min_column_width (int), max_column_width (int), float_precision (int).

  • ValueError – If the column_alignment argument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.

  • Exception – If there is an OS related error encountered when opening or writing to the provided filename.

Returns:

If filename is None, returns the textual representation as a string. If filename is provided, writes the textual representation to the specified file and returns None.

Return type:

str or None

Examples:

Returning Text Literal
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data=data, headers=headers)

# Generate text table literal
text_table = sdm.to_text()

# View output
print(text_table)

This will output:

┌─────────┬──────┬────────┐
│ Name    │  Age │ Height │
├─────────┼──────┼────────┤
│ John    │   30 │  175.3 │
│ Alice   │   28 │  162.0 │
│ Michael │   35 │  185.8 │
└─────────┴──────┴────────┘
Write to File
from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Height']
data = [
    ('John', 30, 175.3),
    ('Alice', 28, 162.0),
    ('Michael', 35, 185.8)
]

# Create the model
sdm = SQLDataModel(data=data, headers=headers)

# Write the output to the file, center-aligning all columns
sdm.to_text(filename='Table.txt', column_alignment='center')

Contents of Table.txt:

┌───┬─────────┬──────┬────────┐
│   │  Name   │ Age  │ Height │
├───┼─────────┼──────┼────────┤
│ 0 │  John   │  30  │ 175.3  │
│ 1 │  Alice  │  28  │ 162.0  │
│ 2 │ Michael │  35  │ 185.8  │
└───┴─────────┴──────┴────────┘

Important

Unlike output from print(sdm) or other calls to SQLDataModel.__repr__(), the output from this method includes the full SQLDataModel and is not restricted by current terminal size or the value set at SQLDataModel.display_max_rows. As such, horizontal truncation only occurs on cell values as determined by max_column_width and no other horizontal or vertical table-wide truncation is performed.

Change Log:
  • Version 0.3.10 (2024-04-16):
    • Added table_style parameter and updated output to reflect new formatting styles introduced in version 0.3.9.

    • Added display_dimensions parameter to allow toggling display of table dimensions in output.

  • Version 0.3.0 (2024-03-31):
    • Renamed include_index parameter to index for package consistency.

Note

transpose(infer_types: bool = True, include_headers: bool = False) SQLDataModel

Transposes the model and returns as a new SQLDataModel.

Parameters:
  • infer_types (bool, optional) – If types should be inferred after the transposition. Defaults to True.

  • include_headers (bool, optional) – If headers are included in the transposed data. Defaults to False.

Returns:

The transposition of the model as a new SQLDataModel instance.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Create the model
sdm = SQLDataModel([('A1', 'A2'), ('B1', 'B2'), ('C1', 'C2')])

# Transpose it
sdm_transposed = sdm.transpose()

# View original
print(f"Original:\n{sdm}")

# Along with transposed
print(f"Transposed:\n{sdm_transposed}")

This will output the result of the transposition:

Original:
┌───┬─────┬─────┐
│   │ 0   │ 1   │
├───┼─────┼─────┤
│ 0 │ A1  │ A2  │
│ 1 │ B1  │ B2  │
│ 2 │ C1  │ C2  │
└───┴─────┴─────┘
[3 rows x 2 columns]

Transposed:
┌───┬─────┬─────┬─────┐
│   │ 0   │ 1   │ 2   │
├───┼─────┼─────┼─────┤
│ 0 │ A1  │ B1  │ C1  │
│ 1 │ A2  │ B2  │ C2  │
└───┴─────┴─────┴─────┘
[2 rows x 3 columns]

Note

  • When infer_types=False, the first row of the transposed result will be used to set the dtypes of the new model. This is generally a poor choice considering the nature of transposing data.

  • If include_headers=True, the headers will be included as the first row in the transposed data.

  • Running this method sequentially should return the original model, sdm == sdm.transpose().transpose()

update_index_at(row_index: int, column_index: int | str, value: Any = None) None

Updates a specific cell in the SQLDataModel at the given row and column indices with the provided value.

Parameters:
  • row_index (int) – The index of the row to be updated.

  • column_index (int or str) – The index or name of the column to be updated.

  • ``value

    The new value to be assigned to the specified cell.

Raises:
  • TypeError – If row_index is not of type ‘int’ or if column_index is not of type ‘int’ or ‘str’.

  • IndexError – If row or column provided as an ‘int’ but is outside of the current model row or column range.

  • ValueError – If column provided as a ‘str’ but is not found in the current model headers.

  • SQLProgrammingError – If there is an issue with the SQL execution during the update.

Returns:

None

Example:

from SQLDataModel import SQLDataModel

# Create an initial 3x3 model filled with dashes
sdm = SQLDataModel.from_shape((3,3), fill='---', headers=['A', 'B', 'C'])

# Update cell based on integer indicies
sdm.update_index_at(0, 0, 'Top Left')
sdm.update_index_at(0, 2, 'Top Right')

# Update cell based on row index and column name
sdm.update_index_at(2, 'A', 'Bottom Left')
sdm.update_index_at(2, 'C', 'Bottom Right')

# Update based on negative row and column indexing
sdm.update_index_at(-2, -2, 'Center')

# View result
print(sdm)

This will output cumulative result of our updates:

┌───┬─────────────┬────────┬──────────────┐
│   │ A           │ B      │ C            │
├───┼─────────────┼────────┼──────────────┤
│ 0 │ Top Left    │ ---    │ Top Right    │
│ 1 │ ---         │ Center │ ---          │
│ 2 │ Bottom Left │ ---    │ Bottom Right │
└───┴─────────────┴────────┴──────────────┘
[3 rows x 3 columns]
Change Log:
  • Version 0.5.2 (2024-05-13):
    • Modified row_index parameter to use SQLDataModel.indicies to index into rows in lieu of row index value equality.

Important

  • Indexing is done using zero-based integers and not done by index value. Most of the time this distinction is irrelevant as the row index at position ‘0’ will have an index value of ‘0’, however this can change after transformation operations like filter or sort. To reset and realign the index value use SQLDataModel.reset_index() or use SQLDataModel.indicies to view the current row indicies.

Note

  • This method only updates individual cells in the current model based on integer indexing for both rows and columns using their (row, column) position.

  • To broadcast updates across row and column dimensions use the syntax of sdm[row, column] = value or see SQLDataModel.__setitem__() for more details.

validate_indicies(indicies) tuple[int | slice, list[str]]

Validates and returns a predictable notation form of indices for accessing rows and columns in the SQLDataModel from varying indexing input types.

Row indexing:
  • int: Single row index.

  • slice: Range of row indices.

  • set: Discontiguous row indicies.

  • tuple: Like set, discontiguous row indices.

Column indexing:
  • str: Single column name.

  • list: List of column names.

  • tuple[int|slice, str|list]: Two-dimensional indexing with rows and columns.

Parameters:

indicies – Specifies the indices for rows and columns.

Raises:
  • TypeError – If the type of indices is invalid such as a float for row index or a boolean for a column name index.

  • ValueError – If the indices are outside the current model range or if a column is not found in the current model headers when indexed by column name as str.

  • IndexError – If the column indices are outside the current column range or if a column is not found in the current model headers when indexed by int.

Returns:

A tuple containing validated row indices and column indices.

Return type:

tuple

Example:

from SQLDataModel import SQLDataModel

headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the sample model
sdm = SQLDataModel(data,headers)

# Example 1: Validate a single row index
validated_row_index, validated_columns = sdm.validate_indicies(3)

# Example 2: Validate a range of row indices and a list of column names
validated_row_indices, validated_columns = sdm.validate_indicies((0, 2, 3), ['first', 'last'])

# Example 3: Validate a slice for row indices and a single column name
validated_row_indices, validated_columns = sdm.validate_indicies(slice(1, 2), 'col_3')

# Example 4: Validate two-dimensional indexing with rows and columns
validated_row_indices, validated_columns = sdm.validate_indicies((slice(0, 3), ['first', 'last']))

Note

  • For two-dimensional indexing, the first element represents rows, and the second element represents columns.

  • Strict validation ensures that column names are checked against the current model headers.

vstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel

Vertically stacks one or more SQLDataModel objects to the current model.

Parameters:
  • other (SQLDataModel or sequence of) – The SQLDataModel objects to vertically stack.

  • inplace (bool, optional) – If True, performs the vertical stacking in-place, modifying the current model. Defaults to False, returning a new SQLDataModel.

Returns:

The vertically stacked SQLDataModel instance when inplace is False.

Return type:

SQLDataModel

Raises:
  • ValueError – If no additional SQLDataModels are provided for vertical stacking.

  • TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.

  • SQLProgrammingError – If an error occurs when updating the model values in place.

Example:

from SQLDataModel import SQLDataModel

# Create models A and B
sdm_a = SQLDataModel([('A', 1), ('B', 2)], headers=['A1', 'A2'])
sdm_b = SQLDataModel([('C', 3), ('D', 4)], headers=['B1', 'B2'])

# Vertically stack B onto A
sdm_ab = sdm_a.vstack(sdm_b)

# View stacked model
print(sdm_ab)

This will output the result of stacking B onto A, using the base model columns and dtypes:

┌─────┬─────┐
│ A1  │  A2 │
├─────┼─────┤
│ A   │   1 │
│ B   │   2 │
│ C   │   3 │
│ D   │   4 │
└─────┴─────┘
[4 rows x 2 columns]

Multiple models can be stacked simultaneously, here we vertically stack 3 models:

# Create a third model C
sdm_c = SQLDataModel([('E', 5), ('F', 6)], headers=['C1', 'C2'])

# Vertically stack all three models
sdm_abc = sdm_a.vstack([sdm_b, sdm_c])

# View stacked result
print(sdm_abc)

This will output the result of stacking C and B onto A:

┌─────┬─────┐
│ A1  │  A2 │
├─────┼─────┤
│ A   │   1 │
│ B   │   2 │
│ C   │   3 │
│ D   │   4 │
│ E   │   5 │
│ F   │   6 │
└─────┴─────┘
[6 rows x 2 columns]

Note

  • Headers and data types are inherited from the model calling the SQLDataModel.vstack() method, casting stacked values corresponding to the base model types.

  • Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use SQLDataModel.concat() for strict concatenation instead of vstack.

  • See SQLDataModel.insert_row() for inserting new values or types other than SQLDataModel directly into the current model.

  • See SQLDataModel.hstack() for horizontal stacking.

where(predicate: str) SQLDataModel

Filters the rows of the current SQLDataModel object based on the specified SQL predicate and returns a new SQLDataModel containing only the rows that satisfy the condition. Only the predicates are needed as the statement prepends the select clause as “select [current model columns] where [predicate]”, see below for detailed examples.

Parameters:

predicate (str) – The SQL predicate used for filtering rows that follows the ‘where’ keyword in a normal SQL statement.

Raises:
  • TypeError – If the provided predicate argument is not of type str.

  • SQLProgrammingError – If the provided string is invalid or malformed SQL when executed against the model

Returns:

A new SQLDataModel containing rows that satisfy the specified predicate.

Return type:

SQLDataModel

Example:

from SQLDataModel import SQLDataModel

# Sample data
headers = ['Name', 'Age', 'Job']
data = [
    ('Billy', 30, 'Barber'),
    ('Alice', 28, 'Doctor'),
    ('John', 25, 'Technician'),
    ('Travis', 35, 'Musician'),
    ('William', 15, 'Student')
]

# Create the model
sdm = SQLDataModel(data, headers)

# Filter model by 'Age' > 30
sdm_filtered = sdm.where('Age > 20')

# View result
print(sdm_filtered)

This will output:

┌───┬────────┬──────┬────────────┐
│   │ Name   │  Age │ Job        │
├───┼────────┼──────┼────────────┤
│ 0 │ Billy  │   30 │ Barber     │
│ 1 │ Alice  │   28 │ Doctor     │
│ 2 │ John   │   25 │ Technician │
│ 3 │ Travis │   35 │ Musician   │
└───┴────────┴──────┴────────────┘
[4 rows x 3 columns]

Filter by multiple parameters:

# Filter by 'Job' and 'Age'
sdm_filtered = sdm.where("Job = 'Student' and Age < 18")

# View result
print(sdm_filtered)

This will output:

┌───┬─────────┬──────┬─────────┐
│   │ Name    │  Age │ Job     │
├───┼─────────┼──────┼─────────┤
│ 4 │ William │   15 │ Student │
└───┴─────────┴──────┴─────────┘
[1 rows x 3 columns]

Note

  • predicate can be any valid SQL, for example ordering can be acheived without any filtering by simple using the argument '(1=1) order by "age" asc'

  • If predicate is not of type str, a TypeError is raised, if it is not valid SQL, SQLProgrammingError will be raised.

ANSIColor

class SQLDataModel.ANSIColor.ANSIColor(text_color: str | tuple = None, text_bold: bool = False)

Bases: object

Creates an ANSI style terminal color using provided hex color or rgb values.

Variables:
  • text_color (str or tuple) – Hex color code or RGB tuple.

  • text_bold (bool) – Whether text should be bold.

Raises:
  • ValueError – If provided string is not a valid hex color code or if provided rgb tuple is invalid.

  • TypeError – If provided text_color or text_bold parameters are of invalid types.

Example:

import ANSIColor

# Create a pen by specifying a color in hex or rgb:
green_bold = ANSIColor("#00ff00", text_bold=True)

# Create a string to use as a sample:
regular_str = "Hello, World!"

# Color the string using the ``wrap()`` method:
green_str = green_bold.wrap(regular_str)

# Print the string in the terminal to see the color applied:
print(f"original string: {regular_str}, green string: {green_str}")

# Get rgb values from existing color
print(green_bold.to_rgb())  # Output: (0, 255, 0)
static ErrorFormat(error: str) str

Formats an error message with ANSI color coding.

Parameters:

error (str) – The error message to be formatted.

Returns:

A string with ANSI color coding, highlighting the error type in bold red.

Return type:

str

Example:

import ANSIColor

# Error message to format
formatted_error = ANSIColor.ErrorFormat("ValueError: Invalid value provided.")

# Display alongside error or exception when raised
print(formatted_error)
__init__(text_color: str | tuple = None, text_bold: bool = False)

Initializes the ANSIColor object with the specified text color and bold setting.

Parameters:
  • text_color (str or tuple) – Hex color code or RGB tuple (default: None, teal color used if not specified)

  • text_bold (bool) – Whether text should be bold (default: False)

Example:

import ANSIColor

# Initialize from hex value with normal weight
color = ANSIColor("#00ff00")

# Initialize from rgb value with bold weight
color = ANSIColor((0,255,0), text_bold=True)
to_rgb() tuple

Returns the text color attribute as a tuple in the format (r, g, b).

Returns:

RGB tuple.

Return type:

tuple

Example:

import ANSIColor

# Create the color
color = ANSIColor("#00ff00")

# Get the rgb values
print(color.to_rgb())  # Output: (0, 255, 0)
wrap(text: str) str

Wraps the provided text in the style of the pen.

Parameters:

text (str) – Text to be wrapped.

Returns:

Wrapped text with ANSI escape codes.

Return type:

str

Example:

import ANSIColor

# Create the color
blue_color = ANSIColor("#0000ff")

# Create a sample string
message = "This string is currently unstyled"

# Wrap the string to change its styling whenever its printed
blue_message = blue_color.wrap(message)

# Print the styled message
print(blue_message)

# Or style string or string object directly in the print statement
print(blue_color.wrap("I'm going to turn blue!"))

HTMLParser

class SQLDataModel.HTMLParser.HTMLParser(*, convert_charrefs: bool = True, cell_sep: str = ' ', table_identifier: int | str = 0)

Bases: HTMLParser

An HTML parser designed to extract tables from HTML content.

This parser subclasses HTMLParser from the standard library to parse HTML content. It extracts tables from the HTML and provides methods to access the table data.

Variables:
  • convert_charrefs (bool) – Flag indicating whether to convert character references to Unicode characters. Default is True.

  • cell_sep (str) – Separator string to separate cells within a row. Default is an empty string.

  • table_identifier (int or str) – Identifier used to locate the target table. It can be either an integer representing the table index, or a string representing the HTML ‘name’ or ‘id’ attribute of the table.

  • _in_td (bool) – Internal flag indicating whether the parser is currently inside a <td> tag.

  • _in_th (bool) – Internal flag indicating whether the parser is currently inside a <th> tag.

  • _current_table (list) – List to hold the current table being parsed.

  • _current_row (list) – List to hold the current row being parsed.

  • _current_cell (list) – List to hold the current cell being parsed.

  • _ignore_next (bool) – Internal flag indicating whether the next token should be ignored.

  • found_target (bool) – Flag indicating whether the target table has been found.

  • _is_finished (bool) – Internal flag indicating whether parsing is finished.

  • table_counter (int) – Counter to keep track of the number of tables encountered during parsing.

  • target_table (list) – List to hold the data of the target table once found.

static ErrorFormat(error: str) str

Formats an error message with ANSI color coding.

Parameters:

error (str) – The error message to be formatted.

Returns:

A string with ANSI color coding, highlighting the error type in bold red.

Return type:

str

Example:

import HTMLParser

# Error message to format
formatted_error = HTMLParser.ErrorFormat("ValueError: Invalid value provided.")

# Display alongside error or exception when raised
print(formatted_error)
handle_data(data: str) None

Handle the data within an HTML tag during parsing.

Parameters:

data (str) – The data contained within the HTML tag.

handle_endtag(tag: str) None

Handle the end of an HTML tag during parsing and modify the parsing tags accordingly.

Parameters:

tag (str) – The name of the HTML tag encountered.

handle_starttag(tag: str, attrs: list[str]) None

Handle the start of an HTML tag during parsing.

Parameters:
  • tag (str) – The name of the HTML tag encountered.

  • attrs (list[str]) – A list of (name, value) pairs representing the attributes of the tag.

validate_table() None

Validate and retrieve the target HTML table data based on table_identifier used for parsing.

Returns:

A tuple containing the table data and headers (if present).

Return type:

tuple[list, list|None]

Raises:

ValueError – If the target table is not found or cannot be parsed.

Note

  • SQLDataModel.from_html() uses this class to extract valid HTML tables from either web or file content.

  • If a row is found with mismatched dimensions, it will be filled with None values to ensure tabular output.

JSONEncoder

class SQLDataModel.JSONEncoder.DataTypesEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

Custom JSON encoder that extends the functionality of json.JSONEncoder to handle additional data types.

Serialization:
  • datetime.date: Serialized as a string in the format ‘YYYY-MM-DD’.

  • datetime.datetime: Serialized as a string in the format ‘YYYY-MM-DD HH:MM:SS’.

  • bytes: Decoded to a UTF-8 encoded string.

Note

  • The date and datetime types can be recovered using SQLDataModels infer_dtypes() method.

  • The bytes information is not decoded back into bytes.

default(obj: Any)

Override the default method to provide custom serialization for specific data types.

Parameters:

obj – The Python object to be serialized.

Returns:

The JSON-serializable representation of the object.

StandardDeviation

class SQLDataModel.StandardDeviation.StandardDeviation

Bases: object

Implementation of standard deviation as an aggregate function for SQLite:

\[\sigma = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}\]
Where:
  • \(x_i\) represents each individual data point in the population.

  • \(\mu\) is the population mean.

  • \(N\) is the total number of data points in the population.

This class provides methods to calculate the standard deviation of a set of values in an SQLite query using the aggregate function mechanism.

Variables:
  • M (float) – The running mean of the values.

  • S (float) – The running sum of the squared differences from the mean.

  • k (int) – The count of non-null input values.

Note

finalize() float

Compute the final standard deviation as part of sqlite3 user-defined aggregate function.

Returns:

The computed standard deviation if the count is greater than or equal to 3, else None.

Return type:

float or None

Note

  • This returns the population standard deviation, not sample standard deviation. It measures of the spread or dispersion of a set of data points within the population, using the entire population.

step(value)

Update the running mean and sum of squared differences with a new value.

Parameters:

value (float) – The input value to be included in the calculation.

Note

  • If the input value is None, it will be ignored.

Exceptions

exception SQLDataModel.exceptions.DimensionError

Bases: Exception

Raised when arguments provided to SQLDataModel are not of compatible dimensions, for example, trying to join a (10, 4) dimensional model to a (7, 5) dimensional model.

Variables:

message (str) – A detailed error message describing the dimension mismatch.

Example:

from SQLDataModel import SQLDataModel

# Example headers and data
headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the sample model with correct dimensions
sdm = SQLDataModel(data,headers)

# This time with one less header, which raises `DimensionError` exception:
try:
    sdm = SQLDataModel(data,headers[:-1])
except DimensionError as e:
    print(e)

# Attempting to assign a row with incompatible shape which also raises `DimensionError` exception:
try:
    sdm[1] = ['sarah', 'west', 30, 'new york']
except DimensionError as e:
    print(e)

Note

  • An argument could be made for using ValueError instead, but there’s enough difference to justify a new error.

exception SQLDataModel.exceptions.SQLProgrammingError

Bases: Exception

Raised when invalid or malformed SQL prevents the execution of a method or returns unexpected behavior, for example, trying to select a column that does not exist in the current model.

Variables:

message (str) – A detailed error message describing the SQL programming error.

Example:

from SQLDataModel import SQLDataModel

# Example headers and data
headers = ['idx', 'first', 'last', 'age']
data = [
    (0, 'john', 'smith', 27)
    ,(1, 'sarah', 'west', 29)
    ,(2, 'mike', 'harlin', 36)
    ,(3, 'pat', 'douglas', 42)
]

# Create the sample model
sdm = SQLDataModel(data,headers)

# Query with invalid syntax to raise `SQLProgrammingError` exception:
try:
    sdm = SQLDataModel.fetch_query("selct first, last from sdm where age > 30")
except SQLProgrammingError as e:
    print(e)

# Query for non-existing column to raise `SQLProgrammingError` exception:
try:
    sdm = SQLDataModel.fetch_query("select first, last, date_of_birth from sdm")
except SQLProgrammingError as e:
    print(e)

Note

  • This exception is used to wrap any sqlite3.ProgrammingError encountered during SQL related operations.