Modules & Documentation
SQLDataModel
- class SQLDataModel.SQLDataModel.SQLDataModel(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = 'default')
Bases:
object
SQLDataModel
Primary class for the package of the same name. Its meant to provide a fast & light-weight alternative to the common pandas, numpy and sqlalchemy setup for moving data in a source/destination agnostic manner. It is not an ORM, any modifications outside of basic joins, group bys and table alteration requires knowledge of SQL. The primary use-case envisaged by the package is one where a table needs to be ETL’d from location A to destination B with arbitrary modifications made if needed:
Summary
Extract your data from SQL, websites or HTML, parquet, JSON, CSV, pandas, numpy, pickle, python dictionaries, lists, etc.
Transform your data using raw SQL or any number of built-in methods covering some of the most used pandas data methods.
Load your data to any number of sources including popular SQL databases, CSV files, JSON, HTML, parquet, pickle, etc.
Usage
from SQLDataModel import SQLDataModel # Lets grab a random table from Wikipedia sdm = SQLDataModel.from_html("https://en.wikipedia.org/wiki/FIFA_World_Cup", table_identifier=6) # Lets see what we found print(sdm)
This will output:
┌───────────────┬──────┬──────┬──────────┬──────────┬──────┬──────┬───────┐ │ Confederation │ AFC │ CAF │ CONCACAF │ CONMEBOL │ OFC │ UEFA │ Total │ ├───────────────┼──────┼──────┼──────────┼──────────┼──────┼──────┼───────┤ │ Teams │ 43 │ 49 │ 46 │ 89 │ 4 │ 258 │ 489 │ │ Top 16 │ 9 │ 11 │ 15 │ 37 │ 1 │ 99 │ 172 │ │ Top 8 │ 2 │ 4 │ 5 │ 36 │ 0 │ 105 │ 152 │ │ Top 4 │ 1 │ 1 │ 1 │ 23 │ 0 │ 62 │ 88 │ │ Top 2 │ 0 │ 0 │ 0 │ 15 │ 0 │ 29 │ 44 │ │ 4th │ 1 │ 1 │ 0 │ 5 │ 0 │ 15 │ 22 │ │ 3rd │ 0 │ 0 │ 1 │ 3 │ 0 │ 18 │ 22 │ │ 2nd │ 0 │ 0 │ 0 │ 5 │ 0 │ 17 │ 22 │ │ 1st │ 0 │ 0 │ 0 │ 10 │ 0 │ 12 │ 22 │ └───────────────┴──────┴──────┴──────────┴──────────┴──────┴──────┴───────┘ [9 rows x 8 columns]
Example:
from SQLDataModel import SQLDataModel # For example, setup a source connection source_db_conn = pyodbc.connect(...) # A destination connection destination_db_conn = sqlite3.connect(...) # Grab your source table sdm = SQLDataModel.from_sql("select * from source_table", source_db_conn) # Modify it however you want, whether through plain SQL sdm = sdm.execute_fetch('select "whatever", "i", "want" from "wherever_i_want" where "what_i_need" is not null ') # Or through any number of built-in methods like filtering sdm = sdm[sdm['create_date'] >= '2023-01-01'] # Or creating new columns sdm['new_date'] = datetime.now() # Or modifying existing ones sdm['salary'] = sdm['salary'] * 2 # Or applying functions sdm['user_id'] = sdm['user_id'].apply(lambda x: x**2) # Or deduplicating sdm = sdm.deduplicate(subset=['user_id','user_name']) # Or iterate through it row-by-row and modify it for idx, row in sdm.iter_tuples(index=True): if row['number'] % 2 == 0: row[idx,'odd_even'] = 'even' else: row[idx,'odd_even'] = 'odd' # Or join it using any of the standard join operations sdm = sdm_left.merge(sdm_right, how='left', left_on='id', right_on='id') # Or group or aggregate the data: sdm_agg = sdm.group_by(["first", "last", "position"]) # Or have your data imported and described for you sdm = SQLDataModel.from_parquet('titanic.parquet').describe() # View result print(sdm)
This will output:
┌────────┬─────────────┬──────────┬────────┬────────┬───────┬────────┐ │ metric │ passengerid │ survived │ pclass │ sex │ age │ fare │ ├────────┼─────────────┼──────────┼────────┼────────┼───────┼────────┤ │ count │ 891 │ 891 │ 891 │ 891 │ 714 │ 891 │ │ unique │ 891 │ 2 │ 3 │ 2 │ 88 │ 248 │ │ top │ 891 │ 0 │ 3 │ male │ 24 │ 8.05 │ │ freq │ 1 │ 549 │ 491 │ 577 │ 30 │ 43 │ │ mean │ 446 │ 0 │ 2 │ NaN │ 29.7 │ 32.2 │ │ std │ 257 │ 0 │ 0 │ NaN │ 14.53 │ 49.69 │ │ min │ 1 │ 0 │ 1 │ female │ 0.42 │ 0 │ │ p25 │ 223 │ 0 │ 2 │ NaN │ 6 │ 7.9 │ │ p50 │ 446 │ 0 │ 3 │ NaN │ 24 │ 14.45 │ │ p75 │ 669 │ 1 │ 3 │ NaN │ 35 │ 31 │ │ max │ 891 │ 1 │ 3 │ male │ 80 │ 512.33 │ │ dtype │ int │ int │ int │ str │ float │ float │ └────────┴─────────────┴──────────┴────────┴────────┴───────┴────────┘ [12 rows x 7 columns]
Move data quickly from one source or format to another:
# Load it to your destination database: sdm.to_sql("new_table", destination_db_conn) # Or any number of formats including: sdm.to_csv("output.csv") sdm.to_html("output.html") sdm.to_json("output.json") sdm.to_latex("output.tex") sdm.to_markdown("output.md") sdm.to_parquet("output.parquet") sdm.to_pickle("output.sdm") sdm.to_text("output.txt") sdm.to_local_db("output.db") # Reload it back again from more formats: sdm = SQLDataModel.from_csv("output.csv") sdm = SQLDataModel.from_dict(py_dict) sdm = SQLDataModel.from_html("output.html") sdm = SQLDataModel.from_json("output.json") sdm = SQLDataModel.from_latex("output.tex") sdm = SQLDataModel.from_markdown("output.md") sdm = SQLDataModel.from_numpy(np_arr) sdm = SQLDataModel.from_pandas(pd_df) sdm = SQLDataModel.from_polars(pl_df) sdm = SQLDataModel.from_parquet("output.parquet") sdm = SQLDataModel.from_pickle("output.sdm") sdm = SQLDataModel.from_sql("output", sqlite3.connect('output.db'))
Data Formats
SQLDataModel
seamlessly interacts with a wide range of data formats providing a versatile platform for data extraction, conversion, and writing. Supported formats include:Arrow
: Convert to and from Apache Arrow format,pyarrow
required.CSV
: Extract from and write to comma separated value,.csv
, files.Excel
: Extract from and write to Excel.xlsx
files,openpyxl
required.HTML
: Extract from web and write to and from.html
files including formatted string literals.JSON
: Extract from and write to.json
files, JSON-like objects, or JSON formatted sring literals.LaTeX
: Extract from and write to.tex
files, LaTeX formatted string literals.Markdown
: Extract from and write to.MD
files, Markdown formatted string literals.Numpy
: Convert to and fromnumpy.ndarray
objects,numpy
required.Pandas
: Convert to and frompandas.DataFrame
objects,pandas
required.Parquet
: Extract from and write to.parquet
files,pyarrow
required.Pickle
: Extract from and write to.pkl
files, package uses.sdm
extension when pickling forSQLDataModel
metadata.Polars
: Convert to and frompolars.DataFrame
objects,polars
required.SQL
: Extract from and write to the following popular SQL databases:SQLite
: Using the built-insqlite3
module.PostgreSQL
: Using thepsycopg2
package.SQL Server
: Using thepyodbc
package.Oracle
: Using thecx_Oracle
package.Teradata
: Using theteradatasql
package.
Text
: Write to and from.txt
files including otherSQLDataModel
string representations.TSV or delimited
: Write to and from files delimited by:\t
: Tab separated values or.tsv
files.\s
: Single space or whitespace separated values.;
: Semicolon separated values.|
: Pipe separated values.:
: Colon separated values.,
: Comma separated values or.csv
files.
Python objects
:dictionaries
: Convert to and from collections of pythondict
objects.lists
: Convert to and from collections of pythonlist
objects.tuples
: Convert to and from collections of pythontuple
objects.namedtuples
: Convert to and from collections ofnamedtuples
objects.
Pretty Printing
SQLDataModel also pretty prints your table in any color you specify, use
SQLDataModel.set_display_color()
and provide either a hex value or a tuple of rgb and print the table, example output:┌───┬─────────────────────┬────────────┬─────────────┬────────┬─────────┐ │ │ full_name │ date │ country │ pin │ service │ ├───┼─────────────────────┼────────────┼─────────────┼────────┼─────────┤ │ 0 │ Pamela Berg │ 2024-09-15 │ New Zealand │ 3010 │ 3.02 │ │ 1 │ Mason Hoover │ 2024-01-23 │ Australia │ 6816 │ 5.01 │ │ 2 │ Veda Suarez │ 2023-09-04 │ Ukraine │ 1175 │ 4.65 │ │ 3 │ Guinevere Cleveland │ 2024-03-22 │ New Zealand │ 4962 │ 3.81 │ │ 4 │ Vincent Mccoy │ 2023-09-16 │ France │ 4446 │ 2.95 │ │ 5 │ Holmes Kemp │ 2024-11-13 │ Germany │ 9396 │ 4.61 │ │ 6 │ Donna Mays │ 2023-06-06 │ Costa Rica │ 8153 │ 5.34 │ │ 7 │ Rama Galloway │ 2023-09-22 │ Italy │ 3384 │ 3.87 │ │ 8 │ Lucas Rodriquez │ 2024-03-16 │ New Zealand │ 3278 │ 2.73 │ │ 9 │ Hunter Donaldson │ 2023-06-30 │ Belgium │ 1593 │ 4.58 │ └───┴─────────────────────┴────────────┴─────────────┴────────┴─────────┘
Note
No additional dependencies are installed with this package, however you will obviously need to have pandas or numpy to create pandas or numpy objects.
Use
SQLDataModel.set_display_color()
to modify the terminal color of the table, by default no color styling is applied.Use
SQLDataModel.get_supported_sql_connections()
to view supported SQL connection packages, please reach out with any issues or questions, thanks!
- static ErrorFormat(error: str) str
Formats an error message with ANSI color coding.
- Parameters:
error – The error message to be formatted.
- Returns:
The modified string with ANSI color coding, highlighting the error type in bold red.
- Return type:
str
Example:
# Format the error message formatted_error = SQLDataModel.ErrorFormat("ValueError: Invalid value provided.") # Should display a colored string to display along with error or exception print(formatted_error)
- static SuccessFormat(success: str) str
Formats a success message with ANSI color coding.
- Parameters:
success – The success message to be formatted.
- Returns:
The modified string with ANSI color coding, highlighting the success source in bold green.
- Return type:
str
Example:
# Message to format formatted_success = SuccessFormat("FileCreated: The operation was successful with new file created.") # Styled message to pass with error or exception print(formatted_success)
- static WarnFormat(warn: str) str
Formats a warning message with ANSI color coding.
- Parameters:
warn – The warning message to be formatted.
- Returns:
The modified string with ANSI color coding, highlighting the class name in bold yellow.
- Return type:
str
Example:
# Warning to format formatted_warning = WarnFormat("DeprecationWarning: This method is deprecated.") # Styled message to pass with error or exception print(formatted_warning)
- __add__(value: str | int | float | SQLDataModel) SQLDataModel
Implements the
+
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.
- Raises:
TypeError – If the provided
value
is not a valid type (str, int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
value
are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as addition) on data of shape(4, 1)
with values of shape(3, 2)
will raise this exception.
- Returns:
A new SQLDataModel resulting from the addition operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model sdm = SQLDataModel(data, headers) # Perform scalar addition sdm['x + 100'] = sdm['x'] + 100 # Perform vector addition using another column sdm['x + y'] = sdm['x'] + sdm['y'] # View both results print(sdm)
This will output:
┌─────┬─────┬─────────┬───────┐ │ x │ y │ x + 100 │ x + y │ ├─────┼─────┼─────────┼───────┤ │ 2 │ 10 │ 102 │ 12 │ │ 4 │ 20 │ 104 │ 24 │ │ 8 │ 30 │ 108 │ 38 │ │ 16 │ 40 │ 116 │ 56 │ │ 32 │ 50 │ 132 │ 82 │ └─────┴─────┴─────────┴───────┘ [5 rows x 4 columns]
We can also use addition to concatenate strings:
from SQLDataModel import SQLDataModel # Sample data headers = ['First', 'Last'] data = [['Alice', 'Smith'],['Bob', 'Johnson'],['Charlie', 'Hall'],['David', 'Brown']] # Create the model sdm = SQLDataModel(data, headers) # Concatenate scalar character sdm['Loud First'] = sdm['First'] + '!' # Concatenate scalar and vector using existing columns sdm['Full Name'] = sdm['First'] + ' ' + sdm['Last'] # View it print(sdm)
This will output:
┌─────────┬─────────┬────────────┬──────────────┐ │ First │ Last │ Loud First │ Full Name │ ├─────────┼─────────┼────────────┼──────────────┤ │ Alice │ Smith │ Alice! │ Alice Smith │ │ Bob │ Johnson │ Bob! │ Bob Johnson │ │ Charlie │ Hall │ Charlie! │ Charlie Hall │ │ David │ Brown │ David! │ David Brown │ └─────────┴─────────┴────────────┴──────────────┘ [4 rows x 4 columns]
Note
Mixing summands such as
int + float
will work, however an exception will be raised when attempting to perform addition on incompatible types such asstr + float
.
- __eq__(other) SQLDataModel
Implements the is equal to operator
==
for comparingSQLDataModel
againstother
and performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModel
or scalar (int
,str
,float
) to compare with.- Returns:
A new SQLDataModel containing the result rows of the operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model sdm = SQLDataModel(data, headers) # Filter by 'gender' column sdm = sdm[sdm['gender'] == 'Female'] # View result print(sdm)
This will output:
┌───┬───────┬──────┬──────┬─────────┬────────────┬────────┐ │ │ first │ last │ age │ service │ hire_date │ gender │ ├───┼───────┼──────┼──────┼─────────┼────────────┼────────┤ │ 0 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 1 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ └───┴───────┴──────┴──────┴─────────┴────────────┴────────┘ [2 rows x 6 columns]
Note
For scalar
other
(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue
.For SQLDataModel
other
, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_count
andcolumn_count
and returns those row indicies evaluating toTrue
.All the equality operations return a python
set
object containing the row indicies which were returned from the evaluation.All operations on standard types like
int
,float
orstr
follow standard behavior and are not modified by performing the operations.Operations can be chained using standard
set
operators like&
and|
to allow complex filtering, multiple operations require parenthesis.
- __floordiv__(value: int | float | SQLDataModel) SQLDataModel
Implements the
//
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
value
is not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
value
are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape(4, 1)
with values of shape(3, 2)
will raise this exception.ZeroDivisionError – If
value
is 0.
- Returns:
A new SQLDataModel resulting from the floor division operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model sdm = SQLDataModel(data, headers) # Perform scalar floor division sdm['y // 10'] = sdm['y'] // 10 # Perform vector floor division using another column sdm['y // x'] = sdm['y'] // sdm['x'] # View both results print(sdm)
This will output:
┌─────┬─────┬─────────┬────────┐ │ x │ y │ y // 10 │ y // x │ ├─────┼─────┼─────────┼────────┤ │ 2 │ 10 │ 1 │ 5 │ │ 4 │ 20 │ 2 │ 5 │ │ 8 │ 30 │ 3 │ 3 │ │ 16 │ 40 │ 4 │ 2 │ │ 32 │ 50 │ 5 │ 1 │ └─────┴─────┴─────────┴────────┘ [5 rows x 4 columns]
Note
Mixing divisor types such as
int // float
will work, however an exception will be raised when attempting to perform division on incompatible types such asstr // float
.
- __ge__(other) SQLDataModel
Implements the greater than or equal to operator
>=
for comparingSQLDataModel
againstother
and performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModel
or scalar (int,str
,float
) to compare with.- Returns:
A new SQLDataModel containing the result rows of the operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model sdm = SQLDataModel(data, headers) # Filter by 'hire_date' column sdm = sdm[sdm['hire_date'] >= datetime.date(2020,1,1)] # View result print(sdm)
This will output:
┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐ │ │ first │ last │ age │ service │ hire_date │ gender │ ├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 1 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ │ 2 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ └───┴───────┴────────┴──────┴─────────┴────────────┴────────┘ [3 rows x 6 columns]
Note
For scalar
other
(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue
.For SQLDataModel
other
, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_count
andcolumn_count
and returns those row indicies evaluating toTrue
.All the equality operations return a python
set
object containing the row indicies which result from the evaluation.All operations on standard types like
int
,float
orstr
follow standard behavior and are not modified by performing the operations.Operations can be chained using standard
set
operators like&
and|
to allow complex filtering, multiple operations require parenthesis.
- __getitem__(target_indicies) SQLDataModel
Retrieves a subset of the SQLDataModel based on the specified indices.
- Parameters:
slc – Indices specifying the rows and columns to be retrieved. This can be an integer, a tuple, a slice, or a combination of these.
- Raises:
ValueError – if there are issues with the specified indices, such as invalid row or column names.
TypeError – if the
slc
type is not compatible with indexing SQLDataModel.IndexError – if the
slc
includes a range or int that is outside of the current row count or column count.
- Returns:
An instance of SQLDataModel containing the selected subset of data.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Retrieve a specific row by index subset_model = sdm[3] # Retrieve multiple rows and specific columns using a tuple subset_model = sdm[(1, 2, 5), ["first_name", "age", "job"]] # Retrieve a range of rows and all columns using a slice subset_model = sdm[2:7] # Retrieve a single column by name subset_model = sdm["first_name"]
- Change Log:
- Version 0.5.0 (2024-05-09):
Modified index retention behavior to pass through row indicies and avoid resetting view order.
Note
The
slc
parameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.The returned SQLDataModel instance will contain the specified subset of rows and columns, retaining the row indicies of the original view.
- __gt__(other) SQLDataModel
Implements the greater than operator
>
for comparingSQLDataModel
againstother
and performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModel
or scalar (int,str
,float
) to compare with.- Returns:
A new SQLDataModel containing the result rows of the operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model sdm = SQLDataModel(data, headers) # Filter by 'service' column sdm = sdm[sdm['service'] > 5.0] # View result print(sdm)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐ │ │ first │ last │ age │ service │ hire_date │ gender │ ├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 1 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ Male │ └───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘ [2 rows x 6 columns]
Note
For scalar
other
(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue
.For SQLDataModel
other
, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_count
andcolumn_count
and returns those row indicies evaluating toTrue
.All the equality operations return a python
set
object containing the row indicies which were returned from the evaluation.All operations on standard types like
int
,float
orstr
follow standard behavior and are not modified by performing the operations.Operations can be chained using standard
set
operators like&
and|
to allow complex filtering, multiple operations require parenthesis.
- __iadd__(value: str | int | float | SQLDataModel) SQLDataModel
Implements the
+=
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.
- Raises:
TypeError – If the provided
value
is not a valid type (str, int, float, or SQLDataModel).- Returns:
The modified SQLDataModel after the addition operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['idx', 'first', 'last', 'age', 'service'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model sdm = SQLDataModel(data, headers) # Modifying first name column with a bang! sdm['first'] += '!' # View model print(sdm)
This will output:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john! │ smith │ 27 │ 1.22 │ │ 1 │ sarah! │ west │ 39 │ 0.70 │ │ 2 │ mike! │ harlin │ 36 │ 3.00 │ │ 3 │ pat! │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
- __idiv__(value: int | float | SQLDataModel) SQLDataModel
Implements the
/=
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
value
is not a valid type (int, float or SQLDataModel).ZeroDivisionError – If
value
of divisor is 0.
- Returns:
The modified SQLDataModel after the division operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Budget']) # Adjust existing column sdm['Budget'] /= 52
- __ifloordiv__(value: int | float | SQLDataModel) SQLDataModel
Implements the
//=
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
value
is not a valid type (int or float).ZeroDivisionError – If
value
is 0.
- Returns:
A new SQLDataModel resulting from the floor division operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x'] data = [[10],[20],[30],[40],[50]] # Create the model sdm = SQLDataModel(data, headers) # Modify the existing column sdm['x'] //= 3 # View result print(sdm)
This will output:
┌───┬──────┐ │ │ x │ ├───┼──────┤ │ 0 │ 3 │ │ 1 │ 6 │ │ 2 │ 10 │ │ 3 │ 13 │ │ 4 │ 16 │ └───┴──────┘ [5 rows x 1 columns]
- __imul__(value: int | float | SQLDataModel) SQLDataModel
Implements the
*=
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
value
is not a valid type (int or float).- Returns:
The modified SQLDataModel after the multiplication operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Salary']) # Give raises to all! sdm['Salary'] *= 12
- __init__(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = 'default')
Initializes a new instance of
SQLDataModel
.- Parameters:
data (list[list]) – The data to populate the model. Should be a list of lists or a list of tuples or a dictionary orientated by rows or columns.
headers (list[str]) – The column headers for the model. If not provided, default headers will be used.
dtypes (dict) – A dictionary specifying the data types for each column. Format: {‘column’: ‘dtype’}.
display_max_rows (int) – The maximum number of rows to display. If not provided, all rows will be displayed.
min_column_width (int) – The minimum width for each column. Default is 3.
max_column_width (int) – The maximum width for each column. Default is 38.
column_alignment (str) – The alignment for columns, must be ‘dynamic’, ‘left’, ‘center’ or ‘right’). Default is ‘dynamic’.
display_color (str|tuple|None) – The color for display as hex code string or rgb tuple.
display_index (bool) – Whether to display row indices. Default is True.
display_float_precision (int) – The number of decimal places to display for float values. Default is 2.
infer_types (bool) – Whether to infer the data types based on a randomly selected sample. Default is False, using first row to derive the corresponding type.
table_style (str) – The styling to use when representing the table, must be ‘ascii’, ‘bare’, ‘dash’, ‘default’, ‘double’, ‘list’, ‘markdown’, ‘outline’, ‘pandas’, ‘polars’, ‘postgresql’ or ‘round’. Default is ‘default’.
- Raises:
ValueError – If
data
andheaders
are not provided, or ifdata
is of insufficient length.TypeError – If
data
orheaders
is not a valid type (list or tuple), or ifdtypes
is not a dictionary.DimensionError – If the length of
headers
does not match the implied column count from the data.SQLProgrammingError – If there’s an issue with executing SQL statements during initialization.
Example:
from SQLDataModel import SQLDataModel # Create sample data data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')] # Create the model with custom headers sdm = SQLDataModel(data, headers=['Name','Age','Sex']) # Display the model print(model)
┌────────┬──────┬──────┐ │ Name │ Age │ Sex │ ├────────┼──────┼──────┤ │ Alice │ 20 │ F │ │ Bob │ 25 │ M │ │ Gerald │ 30 │ M │ └────────┴──────┴──────┘ [3 rows x 3 columns]
Note
If
data
is not provided, an empty model is created with headers, at least one ofdata
,headers
ordtypes
are required to instantiate the model.If
headers
are not provided, default headers will be generated using the the format'0', '1', ..., N
whereN
is the column count.If
dtypes
is provided, it must be a dictionary with column names as keys and Python data types as string values, e.g., {‘first_name’: ‘str’, ‘weight’: ‘float’}If
infer_types = True
anddtypes
are provided, the order will be resolved by first inferring the types, then overriding the inferred types for each{col:type}
provided in thedtypes
argument. If one is not provided, then the inferred type will be used as a fallback.
- __ipow__(value: int | float | SQLDataModel) SQLDataModel
Implements the
**=
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to raise each element in the SQLDataModel to.
- Raises:
TypeError – If the provided
value
is not a valid type (int or float).- Returns:
The modified SQLDataModel after the exponential operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Salary']) # More raises! sdm['Salary'] **= 2
- __isub__(value: int | float | SQLDataModel) SQLDataModel
Implements the
-=
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.
- Raises:
TypeError – If the provided
value
is not a valid type (int, float, or SQLDataModel).- Returns:
The modified SQLDataModel after the subtraction operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age', 'service'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model sdm = SQLDataModel(data, headers) # Modifying age column in the best direction sdm['age'] -= 10 # View model print(sdm)
This will output:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 17 │ 1.22 │ │ 1 │ sarah │ west │ 29 │ 0.70 │ │ 2 │ mike │ harlin │ 26 │ 3.00 │ │ 3 │ pat │ douglas │ 32 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
- __iter__() Iterator
Iterates over a range of rows in the
SQLDataModel
based on the current model’s row indices.- Raises:
StopIteration – When there are no more rows to return.
- Yields:
tuple
– A row fetched from theSQLDataModel
.
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers) # Iterate through rows for row in sdm: print(row)
This will output:
(0, 'John', 30, 175.3) (1, 'Alice', 28, 162.0) (2, 'Travis', 35, 185.8)
Note
This iterator fetches rows from the
SQLDataModel
using a SQL statement generated by theSQLDataModel._generate_sql_stmt()
method.The iteration starts from the first row, index 0, and continues until
SQLDataModel.row_count
is reached.
- __le__(other) SQLDataModel
Implements the less than or equal to operator
<=
for comparingSQLDataModel
againstother
and performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModel
or scalar (int
,str
,float
) to compare with.- Returns:
A new SQLDataModel containing the result rows of the operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model sdm = SQLDataModel(data, headers) # Filter by 'age' column sdm = sdm[sdm['age'] <= 40] # View result print(sdm)
This will output:
┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐ │ │ first │ last │ age │ service │ hire_date │ gender │ ├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 1 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ └───┴───────┴────────┴──────┴─────────┴────────────┴────────┘ [3 rows x 6 columns]
Note
For scalar
other
(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue
.For SQLDataModel
other
, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_count
andcolumn_count
and returns those row indicies evaluating toTrue
.All the equality operations return a python
set
object containing the row indicies which were returned from the evaluation.All operations on standard types like
int
,float
orstr
follow standard behavior and are not modified by performing the operations.Operations can be chained using standard
set
operators like&
and|
to allow complex filtering, multiple operations require parenthesis.
- __len__() int
Returns the
SQLDataModel.row_count
property for the currentSQLDataModel
which represents the current number of rows in the model.- Returns:
The total number of rows in the SQLDataModel.
- Return type:
int
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get current length num_rows = len(sdm) # View number print(num_rows)
This will output:
1000
- __lt__(other) SQLDataModel
Implements the less than operator
<
for comparingSQLDataModel
againstother
and performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModel
or scalar (int
,str
,float
) to compare with.- Returns:
A new SQLDataModel containing the result rows of the operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model sdm = SQLDataModel(data, headers) # Filter by 'age' column sdm = sdm[sdm['age'] < 40] # View result print(sdm)
This will output:
┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐ │ │ first │ last │ age │ service │ hire_date │ gender │ ├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 1 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ └───┴───────┴────────┴──────┴─────────┴────────────┴────────┘ [3 rows x 6 columns]
Note
For scalar
other
(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue
.For SQLDataModel
other
, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_count
andcolumn_count
and returns those row indicies evaluating toTrue
.All the equality operations return a python
set
object containing the row indicies which were returned from the evaluation.All operations on standard types like
int
,float
orstr
follow standard behavior and are not modified by performing the operations.Operations can be chained using standard
set
operators like&
and|
to allow complex filtering, multiple operations require parenthesis.
- __mul__(value: int | float | SQLDataModel) SQLDataModel
Implements the
*
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
value
is not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
value
are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as multiplication) on data of shape(4, 1)
with values of shape(3, 2)
will raise this exception.
- Returns:
A new SQLDataModel resulting from the multiplication operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model sdm = SQLDataModel(data, headers) # Perform scalar multiplication sdm['x * 10'] = sdm['x'] * 10 # Perform vector multiplication using another column sdm['x * y'] = sdm['x'] * sdm['y'] # View results print(sdm)
This will output:
┌─────┬─────┬────────┬───────┐ │ x │ y │ x * 10 │ x * y │ ├─────┼─────┼────────┼───────┤ │ 2 │ 10 │ 20 │ 20 │ │ 4 │ 20 │ 40 │ 80 │ │ 8 │ 30 │ 80 │ 240 │ │ 16 │ 40 │ 160 │ 640 │ │ 32 │ 50 │ 320 │ 1600 │ └─────┴─────┴────────┴───────┘ [5 rows x 4 columns]
Note
Mixing multipliers such as
int * float
will work, however an exception will be raised when attempting to perform multiplication on incompatible types such asstr * float
.
- __ne__(other) SQLDataModel
Implements the not equal to operator
!=
for comparingSQLDataModel
againstother
and performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModel
or scalar (int
,str
,float
) to compare with.- Returns:
A new SQLDataModel containing the result rows of the operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model sdm = SQLDataModel(data, headers) # Filter by 'first' column sdm = sdm[sdm['first'] != 'John'] # View result print(sdm)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐ │ │ first │ last │ age │ service │ hire_date │ gender │ ├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 1 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ │ 2 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ │ 3 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ Male │ └───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘ [4 rows x 6 columns]
Note
For scalar
other
(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue
.For SQLDataModel
other
, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_count
andcolumn_count
and returns those row indicies evaluating toTrue
.All the equality operations return a python
set
object containing the row indicies which were returned from the evaluation.All operations on standard types like
int
,float
orstr
follow standard behavior and are not modified by performing the operations.Operations can be chained using standard
set
operators like&
and|
to allow complex filtering, multiple operations require parenthesis.
- __pow__(value: int | float | SQLDataModel) SQLDataModel
Implements the
**
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The exponent value to raise each element in the SQLDataModel to.
- Raises:
TypeError – If the provided
value
is not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
value
are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as exponentiation) on data of shape(4, 1)
with values of shape(3, 2)
will raise this exception.
- Returns:
A new SQLDataModel resulting from the exponential operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x', 'y'] data = [[2,1], [4,2], [8,3], [16,4], [32,5]] # Create the model sdm = SQLDataModel(data, headers) # Perform scalar exponentiation sdm['y ** 2'] = sdm['y'] ** 2 # Perform vector exponentiation using another column sdm['x ** y'] = sdm['x'] ** sdm['y'] # View results print(sdm)
This will output:
┌─────┬─────┬────────┬──────────┐ │ x │ y │ y ** 2 │ x ** y │ ├─────┼─────┼────────┼──────────┤ │ 2 │ 1 │ 1 │ 2 │ │ 4 │ 2 │ 4 │ 16 │ │ 8 │ 3 │ 9 │ 512 │ │ 16 │ 4 │ 16 │ 65536 │ │ 32 │ 5 │ 25 │ 33554432 │ └─────┴─────┴────────┴──────────┘ [5 rows x 4 columns]
Note
Mixing exponent types such as
int ** float
will work, however an exception will be raised when attempting to exponentiate incompatible types such asstr ** float
.
- __repr__() str
Returns a pretty printed string representation of
SQLDataModel
formatted to the current terminal size.- Returns:
The string representation of the SQLDataModel instance output using display and format values set on instance.
- Return type:
str
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the model sdm = SQLDataModel(data,headers) # Display the string representation print(sdm)
This will output the default alignment, dynamically aligning columns based on their dtype, right-aligned for numeric, left otherwise:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Using
'left'
column alignment:# Using left alignment instead sdm.set_column_alignment("left") # See difference print(sdm)
This will output:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Using
'center'
column alignment:# Using center alignment instead sdm.set_column_alignment("center") # See difference print(sdm)
This will output:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Using
'right'
column alignment:# Using right alignment instead sdm.set_column_alignment("right") # See difference print(sdm)
This will output:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Note
Use
SQLDataModel.set_display_max_rows()
to explicitly set vertical height and modify vertical truncation behavior, which uses current terminal height by default.Use
SQLDataModel.set_min_column_width()
andSQLDataModel.set_max_column_width()
to adjust column widths and modify horizontal truncation behavior.Use
SQLDataModel.set_column_alignment()
to modify column alignment, available options are dynamic alignment based on dtype, left, center or right alignment.Use
SQLDataModel.set_display_color()
to modify the table color, by default no color is applied with characters drawn using platform specific settings.Use
SQLDataModel.set_table_style()
to modify the table style format and box characters used to draw the table.
- __setitem__(target_indicies, update_values) None
Updates specified rows and columns in the SQLDataModel with the provided values.
- Parameters:
target_indicies – Indices specifying the rows and columns to be updated. This can be an integer, a tuple, a slice, or a combination of these.
update_values – The values to be assigned to the corresponding model records. It can be of types: str, int, float, bool, bytes, list, tuple, or another SQLDataModel object.
- Raises:
TypeError – If the
update_values
type is not compatible with SQL datatypes.DimensionError – If there is a shape mismatch between targeted indicies and provided update values.
ValueError – If there are issues with the specified indices, such as invalid row or column names.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Job'] data = [ ('Billy', 30, 'Barber'), ('Alice', 28, 'Doctor'), ('John', 25, 'Technician'), ('Travis', 35, 'Musician'), ('William', 15, 'Student') ] # Create the model sdm = SQLDataModel(data, headers) # Update a specific row with new values sdm[2] = ("John", 25, "Engineer") # See result print(sdm)
This will output:
┌───┬─────────┬──────┬──────────┐ │ │ Name │ Age │ Job │ ├───┼─────────┼──────┼──────────┤ │ 0 │ Billy │ 30 │ Barber │ │ 1 │ Alice │ 28 │ Doctor │ │ 2 │ John │ 25 │ Engineer │ │ 3 │ Travis │ 35 │ Musician │ │ 4 │ William │ 15 │ Student │ └───┴─────────┴──────┴──────────┘ [5 rows x 3 columns]
# Update multiple rows and columns with a list of values sdm[1:5, ["Name", "Age", "Job"]] = [("Alice", 30, "Manager"), ("Bob", 28, "Developer"), ("Charlie", 35, "Designer"), ("David", 32, "Analyst")] # See result print(sdm)
This will output:
┌───┬─────────┬──────┬───────────┐ │ │ Name │ Age │ Job │ ├───┼─────────┼──────┼───────────┤ │ 0 │ Billy │ 30 │ Barber │ │ 1 │ Alice │ 30 │ Manager │ │ 2 │ Bob │ 28 │ Developer │ │ 3 │ Charlie │ 35 │ Designer │ │ 4 │ David │ 32 │ Analyst │ └───┴─────────┴──────┴───────────┘ [5 rows x 3 columns]
# Create a new column "Hobby" and set the values sdm["Hobby"] = [('Fishing',), ('Biking',), ('Computers',), ('Photography',), ('Studying',)] # See result print(sdm)
This will output:
┌───┬─────────┬──────┬───────────┬─────────────┐ │ │ Name │ Age │ Job │ Hobby │ ├───┼─────────┼──────┼───────────┼─────────────┤ │ 0 │ Billy │ 30 │ Barber │ Fishing │ │ 1 │ Alice │ 30 │ Manager │ Biking │ │ 2 │ Bob │ 28 │ Developer │ Computers │ │ 3 │ Charlie │ 35 │ Designer │ Photography │ │ 4 │ David │ 32 │ Analyst │ Studying │ └───┴─────────┴──────┴───────────┴─────────────┘ [5 rows x 4 columns]
Note
If
update_values
is another SQLDataModel object, its data will be normalized using theSQLDataModel.data()
method.The
target_indicies
parameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.Values can be single values or iterables matching the specified rows and columns.
- __sub__(value: int | float | SQLDataModel) SQLDataModel
Implements the
-
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.
- Raises:
TypeError – If the provided
value
is not a valid type (int or float).DimensionError – Raised when the dimensions of the provided
value
are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as subtraction) on data of shape(4, 1)
with values of shape(3, 2)
will raise this exception.
- Returns:
A new SQLDataModel resulting from the subtraction operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model sdm = SQLDataModel(data, headers) # Perform scalar subtraction sdm['x - 100'] = sdm['x'] - 100 # Perform vector subtraction using another column sdm['x - y'] = sdm['x'] - sdm['y'] # View both results print(sdm)
This will output:
┌─────┬─────┬─────────┬───────┐ │ x │ y │ x - 100 │ x - y │ ├─────┼─────┼─────────┼───────┤ │ 2 │ 10 │ -98 │ -8 │ │ 4 │ 20 │ -96 │ -16 │ │ 8 │ 30 │ -92 │ -22 │ │ 16 │ 40 │ -84 │ -24 │ │ 32 │ 50 │ -68 │ -18 │ └─────┴─────┴─────────┴───────┘ [5 rows x 4 columns]
Note
Mixing subtractors such as
int + float
will work, however an exception will be raised when attempting to perform subtraction on incompatible types such asstr - float
.
- __truediv__(value: int | float | SQLDataModel) SQLDataModel
Implements the
/
operator functionality for compatibleSQLDataModel
operations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
value
is not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
value
are incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape(4, 1)
with values of shape(3, 2)
will raise this exception.ZeroDivisionError – If
value
is 0.
- Returns:
A new SQLDataModel resulting from the division operation.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model sdm = SQLDataModel(data, headers) # Perform scalar division sdm['y / 10'] = sdm['y'] / 10 # Perform vector division using another column sdm['y / x'] = sdm['y'] / sdm['x'] # View both results print(sdm)
This will output:
┌─────┬─────┬────────┬───────┐ │ x │ y │ y / 10 │ y / x │ ├─────┼─────┼────────┼───────┤ │ 2 │ 10 │ 1.00 │ 5.00 │ │ 4 │ 20 │ 2.00 │ 5.00 │ │ 8 │ 30 │ 3.00 │ 3.75 │ │ 16 │ 40 │ 4.00 │ 2.50 │ │ 32 │ 50 │ 5.00 │ 1.56 │ └─────┴─────┴────────┴───────┘ [5 rows x 4 columns]
Note
Mixing divisor types such as
int / float
will work, however an exception will be raised when attempting to perform division on incompatible types such asstr / float
.
- _generate_sql_stmt(columns: list[str] = None, rows: int | slice | tuple | str = None, index: bool = True, na_rep: str = None) str
Generate an SQL statement for fetching specific columns and rows from the model, duplicate column references are aliased in order of appearance.
- Parameters:
columns (list of str, optional) – The list of columns to include in the SQL statement. If not provided, all columns from the model will be included.
rows (int, slice, tuple, optional) – The rows to include in the SQL statement. It can be an integer for a single row, a slice for a range of rows, or a tuple for specific row indices. If not provided, all rows will be included.
index (bool, optional) – If True, include the primary index column in the SQL statement.
na_rep (str, optional) – If provided, all null or empty string values are replaced with value.
- Returns:
The generated SQL statement.
- Return type:
str
- Change Log:
- Version 0.5.1 (2024-05-10):
Modified to allow
rows
argument to be provided directly as a string predicate to bypass numeric range-based selections.
- Version 0.4.0 (2024-04-23):
Added
nap_rep
parameter to fill null or missing fields with provided value.
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
No validation is performed on row or column indicies, see
SQLDataModel.validate_indicies()
for implementation and usage.
- _generate_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = None) tuple[tuple[str]]
Generates the character sets required for formatting
SQLDataModel
according to the value currently set atSQLDataModel.table_style
.- Parameters:
style (Literal['ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql','round'], optional) – The table style to return. Default is value set on
SQLDataModel.table_style
.- Returns:
A 4-tuple containing the characters required for top, middle, row and lower table sections.
- Return type:
tuple[tuple[str]]
- Change Log:
- Version 0.3.10 (2024-04-16):
Added
style
parameter to allow use bySQLDataModel.to_text()
to generate new formatting styles introduced in version 0.3.9.
Note
This method is called by
SQLDataModel.__repr__()
to parse the characters necessary for constructing the tabular representation of theSQLDataModel
, any modifications or changes to this method may result in unexpected behavior.
- _get_display_args(include_dtypes: bool = False) dict
Retrieves the current display configuration settings of the
SQLDataModel
with the correctkwargs
for the classSQLDataModel.__init__()
method.- Parameters:
include_dtypes (bool, optional) – Whether
SQLDataModel.dtypes
should be included in the result. Default is False, including only display arguments.- Returns:
A dictionary containing the display configuration settings in the format
{'setting': 'value'}
.- Return type:
dict
- Display Properties:
SQLDataModel.display_max_rows
: The maximum number of rows to display.SQLDataModel.min_column_width
: The minimum width of columns when displaying the model.SQLDataModel.max_column_width
: The maximum width of columns when displaying the model.SQLDataModel.column_alignment
: The alignment of columns (‘left’, ‘center’, ‘right’ or ‘dynamic’).SQLDataModel.display_color
: The color to use when displaying the table, None by default.SQLDataModel.display_index
: True if displaying index column, False otherwise.SQLDataModel.display_float_precision
: The precision for displaying floating-point numbers.SQLDataModel.table_style
: The table styling format to use for strng representations of the model.
- Dtype Property:
SQLDataModel.dtypes
: A dictionary mapping the current model’s columns to their corresponding Python data type.
- Change Log:
- Version 0.6.2 (2024-05-15):
Added
include_dtypes
parameter for use by methods such asSQLDataModel.min()
andSQLDataModel.max()
for operations that require returning the results of SQL fetch statements.
- _get_sql_create_stmt() str
Retrieves the SQL create table statement used to create the current SQLDataModel.
- Returns:
The SQL create table statement for the SQLDataModel.
- Return type:
str
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the sample model sdm = SQLDataModel(data,headers) # Retrieve the create statement for the SQLDataModel create_stmt = sdm._get_sql_create_stmt() # Print the returned statement print(create_stmt)
This will output:
CREATE TABLE "sdm" ("idx" INTEGER PRIMARY KEY,"first" TEXT,"last" TEXT,"age" INTEGER)
- _update_indicies() None
Updates the
SQLDataModel.indicies
andSQLDataModel.row_count
properties of theSQLDataModel
instance representing the current valid row indicies and count.- Returns:
None
Note
This method is called internally any time the
SQLDataModel.row_count
property is subject to change, or data manipulation requires updating the current values.There is no reason to call this method manually unless the model has been changed outside of the standard instance methods.
- _update_indicies_deterministic(row_index: int) None
Quick implementation to update the
SQLDataModel.indicies
andSQLDataModel.row_count
properties of theSQLDataModel
instance representing the current valid row indicies and count based on the last inserted rowid.- Returns:
None
- Change Log:
- Version 0.6.0 (2024-05-14):
New method, improves performance for updating row indicies when update is deterministic.
Note
This method is called internally any time the
SQLDataModel.row_count
property is subject to deterministic change to avoid the more expensive call toSQLDataModel._update_indicies()
- _update_model_metadata(update_row_meta: bool = False) None
Generates and updates metadata information about the columns and optionally the rows in the
SQLDataModel
instance based on the current model.- Attributes updated:
SQLDataModel.header_master
: Master dictionary of column metadata.SQLDataModel.headers
: List of current model headers, order retained.SQLDataModel.column_count
: Number of columns in current model.SQLDataModel.shape
: The current(rows, cols)
dimensions of the model.SQLDataModel.dtype
: The current{'col': 'dtype'}
mapping of the model.SQLDataModel.indicies
: Optionally updated, represents current valid row indicies.SQLDataModel.row_count
: Optionally updated, represents current row count.
- Parameters:
update_row_meta (bool, optional) – If True, updates row metadata information; otherwise, retrieves column metadata only (default).
- Returns:
None
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age', 'service_time'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model with sample data sdm = SQLDataModel(data, headers) # View header master print(sdm.header_master)
This will output:
{'first': ('TEXT', 'str', True, '<'), 'last': ('TEXT', 'str', True, '<'), 'age': ('INTEGER', 'int', True, '>'), 'service_time': ('REAL', 'float', True, '>'), 'idx': ('INTEGER', 'int', False, '>')}
Example Attributes Modified:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age', 'service_time'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 0.7), (2, 'mike', 'harlin', 3), (3, 'pat', 'douglas', 11.5) ] # Create the model with sample data sdm = SQLDataModel(data, headers) # Get current column count num_cols_before = sdm.column_count # Add new column sdm['new_column'] = 'empty' # Method is called behind the scenes sdm._update_model_metadata() # Get new column count num_cols_after = sdm.column_count # View difference print(f"cols before: {num_cols_before}, cols after: {num_cols_after}")
Note
This method is called after operations that may modify the current model’s structure and require synchronization.
- _update_rows_and_columns_with_values(rows_to_update: tuple[int] = None, columns_to_update: list[str] = None, values_to_update: list[tuple] = None) None
Generates and executes a SQL update statement to modify specific rows and columns with provided values in the SQLDataModel.
- Parameters:
rows_to_update – A tuple of row indices to be updated. If set to None, it defaults to all rows in the SQLDataModel.
columns_to_update – A list of column names to be updated. If set to None, it defaults to all columns in the SQLDataModel.
values_to_update – A list of tuples representing values to update in the specified rows and columns.
- Raises:
TypeError – If the
values_to_update
parameter is not a list or tuple.DimensionError – If the shape of the provided values does not match the specified rows and columns.
SQLProgrammingError – If the
values_to_update
parameter contains invalid or SQL incompatible data.
Example:
from SQLDataModel import SQLDataModel # Update specific rows and columns with provided values sdm._update_rows_and_columns_with_values( rows_to_update=(1, 2, 3), columns_to_update=["column1", "column2"], values_to_update=[(10, 'A'), (20, 'B'), (30, 'C')] ) # Create a new column named "new_column" with default values sdm._update_rows_and_columns_with_values( columns_to_update=["new_column"], values_to_update=[(None,)] * sdm.row_count )
Note
Used by
SQLDataModel.__setitem__()
to broadcast updates across row and column index ranges.To create a new column, pass a single header item in a list to the
columns_to_update
parameter.To copy an existing column, pass the corresponding data is a list of tuples to the
values_to_update
parameter.
- add_column_with_values(column_name: str, value=None) None
Adds a new column with the specified
column_name
to theSQLDataModel
. The new column is populated with the values provided in thevalue
argument. Ifvalue
is not provided (default), the new column is populated with NULL values.- Parameters:
column_name (str) – The name of the new column to be added.
value – The value to populate the new column. If None (default), the column is populated with NULL values. If a valid column name is provided, the values of that column will be used to fill the new column.
- Raises:
DimensionError – If the length of the provided values does not match the number of rows in the model.
TypeError – If the data type of the provided values is not supported or translatable to an SQL data type.
Example:
from SQLDataModel import SQLDataModel # Create model from data sdm = SQLDataModel.from_csv('data.csv') # Add new column with default value 42 sdm.add_column_with_values('new_column', value=42) # Add new column by copying values from an existing column sdm.add_column_with_values('new_column', value='existing_column')
Note
Many other methods, including
SQLDataModel.__setitem__()
rely on this method, therefore modifying it may cause unpredictable behavior.Determination for when to copy existing versus when to assign string is value is done by
SQLDataModel.__eq__()
against both values
- static alias_duplicates(headers: list) Generator
Rename duplicate column names in a given list by appending an underscore and a numerical suffix.
- Parameters:
headers (list) – A list of column names that require parsing for duplicates.
- Yields:
Generator
– A generator object that yields the original or modified column names.
Example:
from SQLDataModel import SQLDataModel # Original list of column names with duplicates original_headers = ['ID', 'Name', 'Amount', 'Name', 'Date', 'Amount'] # Use the static method to return a generator for the duplicates renamed_generator = SQLDataModel.alias_duplicates(original_headers) # Obtain the modified column names modified_headers = list(renamed_generator) # View modified column names print(modified_headers) # Output modified_headers = ['ID', 'Name', 'Amount', 'Name_2', 'Date', 'Amount_2']
Example of implementation for SQLDataModel:
# Given a list of headers original_headers = ['ID', 'ID', 'Name', 'Name', 'Name', 'Unique'] # Create a separate list for aliasing duplicates aliased_headers = list(SQLDataModel.alias_duplicates(original_headers)) # View aliases for col, alias in zip(original_headers, aliased_headers): print(f"{col} as {alias}")
This will output:
ID as ID ID as ID_2 Name as Name Name as Name_2 Name as Name_3 Unique as Unique
- Change Log:
- Version 0.3.4 (2024-04-05):
Modified to re-alias partially aliased input to prevent runaway incrementation on suffixes.
- append_row(values: list | tuple = None) None
Appends
values
as a new row in theSQLDataModel
at the next available index based on the current max row index fromSQLDataModel.indicies
. Ifvalues = None
, an empty row with SQLnull
values will be used.- Parameters:
values (list or tuple, optional) – The values to be inserted into the row. If not provided or set to None, an empty row with SQL
null
values will be inserted.- Raises:
TypeError – If
values
is provided and is not of type list or tuple.DimensionError – If the number of values provided does not match the current column count.
SQLProgrammingError – If there is an issue with the SQL execution during the insertion.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create a rowless model sdm = SQLDataModel(headers=['Name', 'Age']) # Append a row with values sdm.append_row(['Alice', 31]) # Append another row sdm.append_row(['John', 48]) # View result print(sdm)
This will output:
┌───┬───────┬──────┐ │ │ Name │ Age │ ├───┼───────┼──────┤ │ 0 │ Alice │ 31 │ │ 1 │ John │ 48 │ └───┴───────┴──────┘ [2 rows x 2 columns]
- Change Log:
- Version 0.6.0 (2024-05-14):
New method, mirrors previous behavior of
SQLDataModel.insert_row()
for versions <= 0.5.2.
Note
If no values are provided,
None
or SQL ‘null’ will be used for the values.Rows will be appended to the bottom of the model at one index greater than the current max index.
- apply(func: Callable) SQLDataModel
Applies
func
to the currentSQLDataModel
object and returns a modifiedSQLDataModel
by passing its current values to the argument offunc
updated with the output.- Parameters:
func (Callable) – A callable function to apply to the
SQLDataModel
.- Raises:
TypeError – If the provided argument for
func
is not a valid callable.SQLProgrammingError – If the provided function is not valid based on the current SQL datatypes.
- Returns:
A modified
SQLDataModel
resulting from the application offunc
.- Return type:
SQLDataModel
Examples:
Applying to Single Column
from SQLDataModel import SQLDataModel # Create the SQLDataModel: sdm = SQLDataModel.from_csv('employees.csv', headers=['First Name', 'Last Name', 'City', 'State']) # Create the function: def uncase_name(x): return x.lower() # Apply to existing column: sdm['First Name'] = sdm['First Name'].apply(uncase_name) # existing column will be updated with new values # Or create new one by passing in a new column name: sdm['New Column'] = sdm['First Name'].apply(uncase_name) # new column will be created with returned values
Applying to Multiple Columns
from SQLDataModel import SQLDataModel # Create the function, note that ``func`` must have the same number of args as the model ``.apply()`` is called on: def summarize_employee(first, last, city, state) summary = f"{first} {last} is from {city}, {state}" # Create a new 'Employee Summary' column for the returned values: sdm['Employee Summary'] = sdm.apply(summarize_employee)
Applying a Built-in Function
import math from SQLDataModel import SQLDataModel # Create the SQLDataModel: sdm = SQLDataModel.from_csv('number-data.csv', headers=['Number']) # Apply the math.sqrt function to the original 'Number' column: sdm_sqrt = sdm.apply(math.sqrt)
Applying a Lambda Function
from SQLDataModel import SQLDataModel # Create the SQLDataModel: sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2']) # Create a new 'Column3' using the values returned from the lambda function: sdm['Column3'] = sdm.apply(lambda x, y: x + y, new_column_name='Sum_Columns')
Note
The number of
args
in the inspected signature offunc
must equal the current number ofSQLDataModel
columns.The number of
func
args must match the current number of columns in the model, or anException
will be raised.Use
SQLDataModel.generate_apply_function_stub()
method to return a preconfigured template using currentSQLDataModel
columns and dtypes to assist.
- apply_function_to_column(func: Callable, column: str | int) None
Applies the specified callable function (func) to the provided
SQLDataModel
column. The function’s output is used to update the values in the column. For broader uses or more input flexibility, see related method apply().- Parameters:
func (Callable) – The callable function to apply to the column.
column (str | int) – The name or index of the column to which the function will be applied.
- Raises:
TypeError – If the provided column argument is not a valid type (str or int).
IndexError – If the provided column index is outside the valid range of column indices.
ValueError – If the provided column name is not valid for the current model.
SQLProgrammingError – If the provided function return types or arg count is invalid or incompatible to SQL types.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('data.csv') # Apply upper() method using lambda function to column ``name`` sdm.apply_function_to_column(lambda x: x.upper(), column='name') # Apply addition through lambda function to column at index 1 sdm.apply_function_to_column(lambda x, y: x + y, column=1)
Note
This method is a simplified version of the
SQLDataModel.apply()
method, which can be used for arbitrary function params and inputs.If providing a function name, ensure it can be used a valid
sqlite3
identifier for the instance’s connection otherwiseSQLProgrammingError
will be raised.
- astype(dtype: Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) SQLDataModel
Casts the model data into the specified python
dtype
.- Parameters:
dtype (Literal['bool', 'bytes', 'datetime', 'float', 'int', 'None', 'str']) – The target python data type to cast the values to.
- Raises:
ValueError – If
dtype
is not one of ‘bool’, ‘bytes’, ‘datetime’, ‘float’, ‘int’, ‘None’, ‘str’.- Returns:
The data casted as the specified type as a new
SQLDataModel
.- Return type:
SQLDataModel
Warning
Type casting will coerce any nonconforming values to the
dtype
being set, this means data will be lost if casting values to incompatible types.
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height', 'Hired'] data = [ ('John', 30, 175.3, 'True'), ('Alice', 28, 162.0, 'True'), ('Travis', 35, 185.8, 'False') ] # Create the model sdm = SQLDataModel(data, headers) # See what we're working with print(sdm)
This will output:
┌────────┬──────┬─────────┬───────┐ │ Name │ Age │ Height │ Hired │ ├────────┼──────┼─────────┼───────┤ │ John │ 30 │ 175.30 │ True │ │ Alice │ 28 │ 162.00 │ True │ │ Travis │ 35 │ 185.80 │ False │ └────────┴──────┴─────────┴───────┘ [3 rows x 4 columns]
We can return the values as new types or save them to a column:
# Convert the string based 'Hired' column to boolean values sdm['Hired'] = sdm['Hired'].astype('bool') # Let's also create a new 'Height' column, this time as an integer sdm['Height int'] = sdm['Height'].astype('int') # See the new values and their types print(sdm)
This will output:
┌────────┬──────┬─────────┬───────┬────────────┐ │ Name │ Age │ Height │ Hired │ Height int │ ├────────┼──────┼─────────┼───────┼────────────┤ │ John │ 30 │ 175.30 │ 1 │ 175 │ │ Alice │ 28 │ 162.00 │ 1 │ 162 │ │ Travis │ 35 │ 185.80 │ 0 │ 185 │ └────────┴──────┴─────────┴───────┴────────────┘ [3 rows x 5 columns]
Note
Unless the returned values are saved as a new column, using this method does not change the underlying column’s type currently assigned to it, to modify the column type use
SQLDataModel.set_column_dtypes()
instead.Any
None
ornull
values encountered will not be coerced to the specifieddtype
, seeSQLDataModel.fillna()
for handling and filling null values appropriately.
- column_alignment
The column alignment to use for string representations of the data, value must be one of
['dynamic','left','center','right']
Default is'dynamic'
, using right-alignment for numeric columns and left-aligned for all others.- Type:
str
- column_count
The current column count of the model.
- Type:
int
- concat(other: SQLDataModel | list | tuple, inplace: bool = True) None | SQLDataModel
Concatenates the provided data to
SQLDataModel
along the row axis, returning a new model or modifying the existing instance inplace.- Parameters:
other (SQLDataModel | list | tuple) – The SQLDataModel, list, or tuple to concatenate or append.
inplace (bool, optional) – If True (default), performs the concatenation in-place, modifying the current model. If False, returns a new
SQLDataModel
instance with the concatenated result.
- Returns:
None
wheninplace = True
andSQLDataModel
whenin_place = False
- Return type:
None
orSQLDataModel
- Raises:
TypeError – If the
other
argument is not one of typeSQLDataModel
,list
, ortuple
.ValueError – If
other
is a list or tuple with insufficient data where the column dimension is < 1.DimensionError – If the column count of the current model does not match the column count of the
other
model or tuple.
Example:
from SQLDataModel import SQLDataModel # Datasets a and b data_a = (['A', 1], ['B', 2]) data_b = (['C', 3], ['D', 4]) # Create the models sdm_a = SQLDataModel(data_a, headers=['letter', 'number']) sdm_b = SQLDataModel(data_b, headers=['letter', 'number']) # Concatenate the two models sdm_ab = sdm_a.concat(sdm_b, inplace=False) # View result print(sdm_ab)
This will output:
┌────────┬────────┐ │ letter │ number │ ├────────┼────────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ │ D │ 4 │ └────────┴────────┘ [4 rows x 2 columns]
# List or tuples can also be used directly data_e = ['E', 5] # Append in place sdm_ab.concat(data_e) # View result print(sdm_ab)
This will output:
┌───┬────────┬────────┐ │ │ letter │ number │ ├───┼────────┼────────┤ │ 0 │ A │ 1 │ │ 1 │ B │ 2 │ │ 2 │ C │ 3 │ │ 3 │ D │ 4 │ │ 4 │ E │ 5 │ └───┴────────┴────────┘ [5 rows x 2 columns]
Note
Models must be of compatible dimensions with equal
column_count
or equivalent dimension iflist
ortuple
Headers are inherited from the model calling the
SQLDataModel.concat()
method whether done inplace or being returned as new instance.
- copy(data_only: bool = False) SQLDataModel
Returns a deep copy of the current model as a new
SQLDataModel
.- Parameters:
data_only (bool) – If True, only the data is copied, otherwise display and styling parameters are included. Default is False.
- Returns:
A cloned copy from the original as a new
SQLDataModel
.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the original model with list styling sdm = SQLDataModel(data, headers, table_style='list') # Create two copies, one full and one with data only copy_full = sdm.copy() copy_data = sdm.copy(data_only=True) # View both copies print(copy_full) print(copy_data)
This will output both copies, with
copy_full
including any styling parameters such astable_style='list'
:Name Age Height - ------ --- ------- 0 John 30 175.30 1 Alice 28 162.00 2 Travis 35 185.80
With the output for
copy_data
containing only the original model’s data:┌───┬────────┬─────┬─────────┐ │ │ Name │ Age │ Height │ ├───┼────────┼─────┼─────────┤ │ 0 │ John │ 30 │ 175.30 │ │ 1 │ Alice │ 28 │ 162.00 │ │ 2 │ Travis │ 35 │ 185.80 │ └───┴────────┴─────┴─────────┘
Note
Model headers and dtypes are considered part of the model data and are included when
data_only=True
.Default behavior,
data_only=False
, includes the following additional display parameters:SQLDataModel.display_max_rows
: The maximum number of rows to display.SQLDataModel.min_column_width
: The minimum width of columns when displaying the model.SQLDataModel.max_column_width
: The maximum width of columns when displaying the model.SQLDataModel.column_alignment
: The alignment of columns (‘left’, ‘center’, ‘right’ or ‘dynamic’).SQLDataModel.display_color
: The color to use when displaying the table, None by default.SQLDataModel.display_index
: True if displaying index column, False otherwise.SQLDataModel.display_float_precision
: The precision for displaying floating-point numbers.SQLDataModel.table_style
: The table styling format to use for strng representations of the model.
- count() SQLDataModel
Returns a new
SQLDataModel
containing the counts of non-null values for each column in a row-wise orientation.- Returns:
A new SQLDataModel containing the counts of non-null values in each column.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data with missing values headers = ['Name', 'Age', 'Gender', 'Tenure'] data = [ ('Alice', 25, 'Female', 1.0), ('Bob', None, 'Male', 2.7), ('Charlie', 30, 'Male', None), ('David', None, 'Male', 3.8) ] # Create the model sdm = SQLDataModel(data, headers) # Get counts counts = sdm.count() # View result print(counts)
This will output the count of all non-null values for each column:
┌──────┬─────┬────────┬────────┐ │ Name │ Age │ Gender │ Tenure │ ├──────┼─────┼────────┼────────┤ │ 4 │ 2 │ 4 │ 3 │ └──────┴─────┴────────┴────────┘ [1 rows x 4 columns]
Note
See
SQLDataModel.count_unique()
for column-wise count of unique, null and total values for each column.
- count_unique() SQLDataModel
Returns a new
SQLDataModel
containing the total counts and unique values for each column in the model for both null and non-null values.- Metrics:
'column'
contains the names of the columns counted.'na'
contains the total number of null values in the column.'unique'
contains the total number of unique values in the column.'count'
contains the total number of non-null values in the column.'total'
contains the total number of all null and non-null values in the column.
- Returns:
A new SQLDataModel containing columns ‘column’, ‘unique’, and ‘count’ representing the column name, total unique values, and total values count, respectively.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Gender'] data = [ ('Alice', 25, 'Female'), ('Bob', 30, None), ('Alice', 25, 'Female') ] # Create the model sdm = SQLDataModel(data, headers) # Get the value count information count_model = sdm.count_unique() # View the count information print(count_model)
This will output:
┌────────┬──────┬────────┬───────┬───────┐ │ column │ na │ unique │ count │ total │ ├────────┼──────┼────────┼───────┼───────┤ │ Name │ 0 │ 2 │ 3 │ 3 │ │ Age │ 0 │ 2 │ 3 │ 3 │ │ Gender │ 1 │ 1 │ 2 │ 3 │ └────────┴──────┴────────┴───────┴───────┘ [3 rows x 5 columns]
- Change Log:
- Version 0.3.2 (2024-04-02):
Renamed method from
counts
tocount_unique
for more precise definition.
Note
See
SQLDataModel.count()
for the count of non-null values for each column in a row-wise orientation.
- data(index: bool = False, include_headers: bool = False, strict_2d: bool = False) list[tuple]
Returns the
SQLDataModel
data as a list of tuples for multiple rows, a single tuple for individual rows, as a single item for individual cells. Data is returned without index and headers by default, useinclude_headers=True
orindex=True
to modify.- Parameters:
index (bool, optional) – If True, includes the index in the result; if False, excludes the index. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result; if False, excludes headers. Default is False.
strict_2d (bool, optional) – If True, returns data as a 2-dimensional list of tuples regardless of data dimension. Default is False.
- Returns:
The data currently stored in the model as a list of tuples.
- Return type:
list[tuple]
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers, display_float_precision=2) # View full table print(sdm)
This will output:
┌────────┬──────┬─────────┐ │ Name │ Age │ Height │ ├────────┼──────┼─────────┤ │ John │ 30 │ 175.30 │ │ Alice │ 28 │ 162.00 │ │ Travis │ 35 │ 185.80 │ └────────┴──────┴─────────┘ [3 rows x 3 columns]
Get data for specific row:
# Grab data from single row row_data = sdm[0].data() # View it print(row_data)
This will output the row as a tuple of values:
('John', 30, 175.3)
Get data for specific column:
# Grab data from single column col_data = sdm['Name'].data() # View it print(col_data)
This will output the column values as a list of tuples:
[('John',), ('Alice',), ('Travis',)]
- Change Log:
- Version 0.5.0 (2024-05-09):
Added
strict_2d
parameter to allow predictable return type regardless of data dimension.
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
Many other
SQLDataModel
methods rely on this method, changing it will lead to undefined behavior.See related
SQLDataModel.from_data()
for creating a newSQLDataModel
from existing data sources.Use
strict_2d = True
to always return data as a list of tuples regardless of data dimension.
- deduplicate(subset: list[str] = None, reset_index: bool = True, keep_first: bool = True, inplace: bool = True) None | SQLDataModel
Removes duplicate rows from the SQLDataModel based on the specified subset of columns. Deduplication occurs inplace by default, otherwise use
inplace=False
to return a newSQLDataModel
.- Parameters:
subset (list[str], optional) – List of columns to consider when identifying duplicates. If None, all columns are considered. Defaults to None.
reset_index (bool, optional) – If True, resets the index after deduplication starting at 0; otherwise retains current indicies.
keep_first (bool, optional) – If True, keeps the first occurrence of each duplicated row; otherwise, keeps the last occurrence. Defaults to True.
inplace (bool, optional) – If True, modifies the current SQLDataModel in-place; otherwise, returns a new SQLDataModel without duplicates. Defaults to True.
- Raises:
ValueError – If a column specified in
subset
is not found in the SQLDataModel.- Returns:
If
inplace = True
the method modifies the current SQLDataModel in-place returnNone
, otherwise ifinplace = False
a newSQLDataModel
is returned.- Return type:
None
orSQLDataModel
Examples:
Based on Single Column
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Deduplicate based on a specific column sdm.deduplicate(subset='ID', keep_first=True, inplace=True)
Based on Multiple Columns
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Deduplicate based on multiple columns and save to keep both models sdm_deduped = sdm.deduplicate(subset=['ID', 'Name'], keep_first=False, inplace=False)
Note
Ordering for
keep_first
is determined by the currentSQLDataModel.sql_idx
order of the instance.For multiple columns ordering is done sequentially favoring first index in
subset
, then i+1, …, toi+len(subset)
- describe(exclude_columns: str | list = None, exclude_dtypes: list[Literal['str', 'int', 'float', 'date', 'datetime', 'bool']] = None, ignore_na: bool = True, **kwargs) SQLDataModel
Generates descriptive statistics for columns in the
SQLDataModel
instance based on column dtype including count, unique values, top value, frequency, mean, standard deviation, minimum, 25th, 50th, 75th percentiles, maximum and dtype for specified column.- Parameters:
exclude_columns (str | list, optional) – Columns to exclude from the analysis. Default is None.
exclude_dtypes (list[Literal["str", "int", "float", "date", "datetime", "bool"]], optional) – Data types to exclude from the analysis. Default is None.
ignore_na (bool, optional) – If True, ignores NA like values (‘NA’, ‘ ‘, ‘None’) when computing statistics. Default is True.
**kwargs – Additional keyword arguments to be passed to the
execute_fetch
method.
- Statistics Described:
count
: Total number of non-null values for specified columnunique
: Total number of unique values for specified columntop
: Top value represented for specified column, ties broken arbitrarilyfreq
: Frequency of corresponding value represented in ‘top’ metricmean
: Mean as calculated by summing all values and dividing by ‘count’std
: Standard Deviation for specified columnUncorrected sample standard deviation for
int
,float
dtypesMean time difference represented in number of days for
date
,datetime
dtypes‘NaN’ for all other dtypes
min
: Minimum value for specified columnLeast value for
int
,float
dtypesLeast value sorted by alphabetical ascending for
str
dtypesEarliest date or datetime for
date
,datetime
dtypes
p25
: Percentile, 25thMax first bin value as determined by quartered binning of values for
int
,float
dtypes‘NaN’ for all other dtypes
p50
: Percentile, 50thMax second bin value as determined by quartered binning of values for
int
,float
dtypes‘NaN’ for all other dtypes
p75
: Percentile, 75thMax third bin value as determined by quartered binning of values for
int
,float
dtypes‘NaN’ for all other dtypes
max
: Maximum value for specified columnGreatest value for
int
,float
dtypesGreatest value sorted by alphabetical ascending for
str
dtypesLatest date or datetime for
date
,datetime
dtypes
dtype
: Datatype of specified columnPython datatype as determined by relevant class
__name__
attribute, e.g. ‘float’ or ‘int’dtypes can be excluded by using
exclude_dtypes
parameter
- Returns:
A new SQLDataModel containing a comprehensive set of descriptive statistics for selected columns.
- Return type:
SQLDataModel
Note
Standard deviation is calculated using uncorrected sample standard deviation for numeric dtypes, and timediff in days for datetime dtypes
Ties in unique, top and freq columns are broken arbitrarily as determined by first ordering of values prior to calling
describe()
Ties encountered when binning for p25, p50, p75 will favor lower bins for data that cannot be quartered cleanly
Metrics for count, min, p25, p50, p75 and max include non-null values only
Using
ignore_na=True
only affects inclusion of ‘NA like’ values such as empty stringsFloating point precision determined by
SQLDataModel.display_float_precision
attribute
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('employees.csv') # View all 10 rows print(sdm)
This will output:
┌───┬──────────────────┬────────────┬─────────────┬───────────────┬────────┬─────────────────────┐ │ │ name │ hire_date │ country │ service_years │ age │ last_update │ ├───┼──────────────────┼────────────┼─────────────┼───────────────┼────────┼─────────────────────┤ │ 0 │ Pamela Berg │ 2007-06-06 │ New Zealand │ 3.02 │ 56 │ 2023-08-12 17:13:46 │ │ 1 │ Mason Hoover │ 2009-04-19 │ Australia │ 5.01 │ 41 │ 2023-05-18 01:29:44 │ │ 2 │ Veda Suarez │ 2007-07-02 │ Ukraine │ 4.65 │ 26 │ 2023-12-09 15:38:01 │ │ 3 │ John Smith │ 2017-08-12 │ New Zealand │ 3.81 │ 35 │ 2023-03-10 18:23:56 │ │ 4 │ Xavier McCoy │ 2021-04-03 │ France │ 2.95 │ 42 │ 2023-09-27 11:39:08 │ │ 5 │ John Smith │ 2020-10-11 │ Germany │ 4.61 │ 56 │ 2023-12-09 18:41:52 │ │ 6 │ Abigail Mays │ 2021-07-25 │ Costa Rica │ 5.34 │ 50 │ 2023-02-11 16:43:07 │ │ 7 │ Rama Galloway │ 2009-02-09 │ Italy │ 3.87 │ 24 │ 2023-03-13 16:08:48 │ │ 8 │ Lucas Rodriquez │ 2018-06-19 │ New Zealand │ 2.73 │ 28 │ 2023-03-17 01:45:22 │ │ 9 │ Hunter Donaldson │ 2015-12-18 │ Belgium │ 4.58 │ 43 │ 2023-04-06 03:22:54 │ └───┴──────────────────┴────────────┴─────────────┴───────────────┴────────┴─────────────────────┘ [10 rows x 6 columns]
Now that we have our
SQLDataModel
, we can generate some statistics:# Generate statistics sdm_described = sdm.describe() # View stats print(sdm_described)
This will output:
┌────────┬──────────────┬─────────────┬─────────────┬───────────────┬────────┬─────────────────────┐ │ metric │ name │ hire_date │ country │ service_years │ age │ last_update │ ├────────┼──────────────┼─────────────┼─────────────┼───────────────┼────────┼─────────────────────┤ │ count │ 10 │ 10 │ 10 │ 10 │ 10 │ 10 │ │ unique │ 9 │ 10 │ 8 │ 10 │ 9 │ 10 │ │ top │ John Smith │ 2021-07-25 │ New Zealand │ 5.34 │ 56 │ 2023-12-09 18:41:52 │ │ freq │ 2 │ 1 │ 3 │ 1 │ 2 │ 1 │ │ mean │ NaN │ 2014-11-24 │ NaN │ 4.06 │ 40 │ 2023-06-16 19:18:39 │ │ std │ NaN │ 2164.4 days │ NaN │ 0.92 │ 11 │ 117.58 days │ │ min │ Abigail Mays │ 2007-06-06 │ Australia │ 2.73 │ 24 │ 2023-02-11 16:43:07 │ │ p25 │ NaN │ 2009-02-09 │ NaN │ 3.02 │ 28 │ 2023-03-13 16:08:48 │ │ p50 │ NaN │ 2017-08-12 │ NaN │ 4.58 │ 42 │ 2023-05-18 01:29:44 │ │ p75 │ NaN │ 2020-10-11 │ NaN │ 4.65 │ 50 │ 2023-09-27 11:39:08 │ │ max │ Xavier McCoy │ 2021-07-25 │ Ukraine │ 5.34 │ 56 │ 2023-12-09 18:41:52 │ │ dtype │ str │ date │ str │ float │ int │ datetime │ └────────┴──────────────┴─────────────┴─────────────┴───────────────┴────────┴─────────────────────┘ [12 rows x 7 columns]
Specific columns or data types can be excluded from result:
# Set filters to exclude all str dtypes and the 'hire_date' column: sdm_describe = sdm.describe(exclude_dtypes=['str'], exclude_columns=['hire_date']) # View statistics print(sdm_described)
This will output:
┌────────┬───────────────┬────────┬─────────────────────┐ │ metric │ service_years │ age │ last_update │ ├────────┼───────────────┼────────┼─────────────────────┤ │ count │ 10 │ 10 │ 10 │ │ unique │ 10 │ 9 │ 10 │ │ top │ 5.34 │ 56 │ 2023-10-28 05:42:43 │ │ freq │ 1 │ 2 │ 1 │ │ mean │ 4.06 │ 40 │ 2023-08-11 23:18:12 │ │ std │ 0.92 │ 11 │ 73.15 days │ │ min │ 2.73 │ 24 │ 2023-04-07 23:56:06 │ │ p25 │ 3.02 │ 28 │ 2023-06-02 14:36:19 │ │ p50 │ 4.58 │ 42 │ 2023-09-09 19:18:38 │ │ p75 │ 4.65 │ 50 │ 2023-10-09 19:34:55 │ │ max │ 5.34 │ 56 │ 2023-10-28 05:42:43 │ │ dtype │ float │ int │ datetime │ └────────┴───────────────┴────────┴─────────────────────┘ [12 rows x 4 columns]
- Change Log:
- Version 0.6.3 (2024-05-16):
Modified model to output values as string data types and set columns to right-aligned if arguments are not present in
kwargs
to retain metric resolution while having numeric alignment.
Important
Generally, do not rely on
SQLDataModel
to do statistics, useNumPy
or a real scientific computing library instead.
Note
Use
SQLDataModel.infer_dtypes()
to cast columns to their apparent data type, or set it manually withSQLDataModel.set_column_dtypes()
to convert columns to different data types.Statistics for
date
anddatetime
can be unpredictable if formatting used is inconsistent with conversion to Julian days or if column data type is incorrect.
- display_color
The display color to use for string representations of the model. Default is
None
, using the standard terminal color.- Type:
ANSIColor
- display_float_precision
The floating point precision to use for string representations of the table, does not affect the actual floating point values stored in the model. Default is 2.
- Type:
int
- display_index
Determines whether the index column is displayed when string representations of the table are generated. Default is True.
- Type:
bool
- display_max_rows
The maximum number of rows to display. Default is 1,000 rows.
- Type:
int
- drop_column(column: int | str | list, inplace: bool = True) None | SQLDataModel
Drops the specified column(s) from the
SQLDataModel
. Values forcolumn
can be a single column name or index, or a list of multiple column names or indicies to drop from the model.- Parameters:
column (int | str | list) – The index, name, or list of indices/names of the column(s) to drop.
inplace (bool) – If True, drops the column(s) in-place and updates the model metadata. If False, returns a new
SQLDataModel
object without the dropped column(s) and does not modify the original object. Default is True.
- Returns:
If inplace is True, returns None. Otherwise, returns a new
SQLDataModel
object without the dropped column(s).- Return type:
None | SQLDataModel
- Raises:
TypeError – If the column parameter is not of type ‘int’, ‘str’, or a list containing equivalent types.
IndexError – If any provided column index is outside the current column range.
ValueError – If any provided column name is not found in the model’s headers.
Examples:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Gender', 'City'] data = [ ('Alice', 30, 'Female', 'Milwaukee'), ('Sarah', 35, 'Female', 'Houston'), ('Mike', 28, 'Male', 'Atlanta'), ('John', 25, 'Male', 'Boston'), ('Bob', 22, 'Male', 'Chicago'), ] # Create the model sdm = SQLDataModel(data,headers) # Drop the 'Gender' column sdm.drop_column('Gender') # View updated model print(sdm)
This will output:
┌───────┬──────┬───────────┐ │ Name │ Age │ City │ ├───────┼──────┼───────────┤ │ Alice │ 30 │ Milwaukee │ │ Sarah │ 35 │ Houston │ │ Mike │ 28 │ Atlanta │ │ John │ 25 │ Boston │ │ Bob │ 22 │ Chicago │ └───────┴──────┴───────────┘ [5 rows x 3 columns]
Dropping multiple columns:
# Drop first and last columns by index sdm.drop_column([0,-1]) # View updated model print(sdm)
This will output:
┌──────┬────────┐ │ Age │ Gender │ ├──────┼────────┤ │ 30 │ Female │ │ 35 │ Female │ │ 28 │ Male │ │ 25 │ Male │ │ 22 │ Male │ └──────┴────────┘ [5 rows x 2 columns]
Drop columns and return as a new
SQLDataModel
:# Drop the multiple columns and return as a new model sdm = sdm.drop_column(['Age','Gender'], inplace=False) # View updated model print(sdm)
This will output:
┌───────┬───────────┐ │ Name │ City │ ├───────┼───────────┤ │ Alice │ Milwaukee │ │ Sarah │ Houston │ │ Mike │ Atlanta │ │ John │ Boston │ │ Bob │ Chicago │ └───────┴───────────┘ [5 rows x 2 columns]
Note
Arguments for
column
can be a singlestr
orint
orlist[str]
containingstr
orlist[int]
containingint
representing column names or column indicies, respectively, but they cannot be combined and provided together. For example, passingcolumns = ['First Name', 3]
will raise aTypeError
exception.The equivalent of this method can also be achieved by simply indexing the required rows and columns using
sdm[rows, column]
notation, seeSQLDataModel.__getitem__()
for additional details.
- dtypes
The current model data types mapped to each column in the format of
{'col': 'dtype'}
where'dtype'
is a string representing the corresponding python type.- Type:
dict[str, str]
- execute_fetch(sql_query: str, sql_params: tuple = None, **kwargs) SQLDataModel
Returns a new
SQLDataModel
object, including display and style properties, after executing the provided SQL query using the currentSQLDataModel
. This method is called by other methods which expect results to be returned from their execution.- Parameters:
sql_query (str) – The SQL query to execute with the expectation of rows returned.
**kwargs (optional) – Additional keyword args to pass to
SQLDataModel
constructor
- Raises:
SQLProgrammingError – If the provided SQL query is invalid or malformed.
ValueError – If the provided SQL query was valid but returned 0 rows, which is insufficient to return a new model.
- Returns:
A new
SQLDataModel
instance containing the result of the SQL query.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2']) # Create the SQL query to execute query = 'SELECT * FROM sdm WHERE Column1 > 10' # Fetch and save the result to a new instance result_model = sdm.execute_fetch(query) # Create a parameterized SQL query to execute query = 'SELECT * FROM sdm WHERE Column1 > ? OR Column2 < ?' params = (10, 20) # Provide the SQL and the statement parameters result_parameterized = sdm.execute_fetch(query, params)
Important
The default table name is
'sdm'
, you can useSQLDataModel.set_model_name()
to modify the name used bySQLDataModel
.
- Change Log:
- Version 0.6.2 (2024-05-15):
Inclusion of
SQLDataModel.table_style
argument in returnedSQLDataModel
to inherit all display properties in result.
Note
Use
SQLDataModel.set_model_name()
to modify the table name used by the model, default name set as'sdm'
.Display properties such as float precision, index column or table styling are also passed to the new instance when not provided in
kwargs
.
- execute_statement(sql_stmt: str) None
Executes an arbitrary SQL query against the current model without the expectation of selection or returned rows.
- Parameters:
sql_stmt (str) – The SQL query to execute.
- Raises:
SQLProgrammingError – If the SQL execution fails.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('data.csv') # Execute statement without results, modifying column in place sdm.execute_statement('UPDATE table SET column = value WHERE condition')
Note
To execute a query with the expectation of results, see
SQLDataModel.execute_fetch()
methodTo execute multiple queries within a single transaction, see
SQLDataModel.execute_transaction()
method
- execute_transaction(sql_script: str) None
Executes a prepared SQL script wrapped in a transaction against the current model without the expectation of selection or returned rows.
- Parameters:
sql_script (str) – The SQL script to execute within a transaction.
- Raises:
SQLProgrammingError – If the SQL execution fails.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('data.csv') # Script to update columns with predicate transaction_script = ''' UPDATE table1 SET column1 = value1 WHERE condition1; UPDATE table2 SET column2 = value2 WHERE condition2; ''' # Execute the script sdm.execute_transaction(transaction_script)
Note
If no errors were raised, the transaction was successfully executed as provided.
Parameters cannot be passed to as a transaction script in
sqlite3
, useSQLDataModel.execute_fetch()
method if parameter bindings are required.Many other methods heavily rely on the
SQLDataModel.execute_transaction()
method, therefore modifying it may adversly affect many other methods.
- fillna(value, strictly_null: bool = False, inplace: bool = True) None | SQLDataModel
Fills missing (na or nan) values in the current
SQLDataModel
with the providedvalue
inplace or as a new instance.- Parameters:
value – The scalar value to fill missing values with. Should be of type ‘str’, ‘int’, ‘float’, ‘bytes’, or ‘bool’.
inplace (bool) – If True, modifies the current instance in-place. If False, returns a new instance with missing values filled.
strictly_null (bool) – If True, only strictly null values are filled. If False, values like
'NA'
,'NaN'
,'n/a'
,'na'
, and whitespace only strings are also filled.
- Raises:
TypeError – If
value
is not a scalar type or is incompatible with SQLite’s type system.- Returns:
When
inplace=True
modifies model inplace, returningNone
, wheninplace=False
a newSQLDataModel
is returned.- Return type:
None
orSQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create sample data data = [('Alice', 25, None), ('Bob', None, 'N/A'), ('Charlie', 'NaN', ' '), ('David', 30, 'NA')] # Create the model sdm = SQLDataModel(data, headers=['Name', 'Age', 'Status']) # Fill missing values with 0 sdm_filled = sdm.fillna(value=0, strictly_null=False, inplace=False) # View filled model print(sdm_filled)
This will output:
┌───┬─────────┬──────┬────────┐ │ │ Name │ Age │ Status │ ├───┼─────────┼──────┼────────┤ │ 0 │ Alice │ 25 │ 0 │ │ 1 │ Bob │ 0 │ 0 │ │ 2 │ Charlie │ 0 │ 0 │ │ 3 │ David │ 30 │ 0 │ └───┴─────────┴──────┴────────┘ [4 rows x 3 columns]
Note
The method supports filling missing values with various scalar types which are then adapted to the columns set dtype.
The
strictly_null
parameter controls whether additional values like('NA', 'NAN', 'n/a', 'na', '')
with last being an empty string, are treated as null.
- static flatten_json(json_source: list | dict, flatten_rows: bool = True, level_sep: str = '_', key_prefix: str = None) dict
Parses raw JSON data and flattens it into a dictionary with optional normalization.
- Parameters:
json_source (dict | list) – The raw JSON data to be parsed.
flatten_rows (bool) – If True, the data will be normalized into columns and rows. If False,
key_prefix. (columns will be concatenated from each row using the specified) –
level_sep (str) – Separates nested levels from other levels and used to concatenate prefix to column.
key_prefix (str) – The prefix to prepend to the JSON keys. If None, an empty string is used.
- Returns:
A flattened dictionary representing the parsed JSON data.
- Return type:
dict
Example:
from SQLDataModel import SQLDataModel # Sample JSON json_source = [{ "alpha": "A", "value": 1 }, { "alpha": "B", "value": 2 }, { "alpha": "C", "value": 3 }] # Flatten JSON with normalization flattened_data = flatten_json(json_data, flatten_rows=True) # Format of result flattened_data = {"alpha": ['A','B','C'], "value": [1, 2, 3]} # Alternatively, flatten columns without rows and adding a prefix flattened_data = flatten_json(raw_input,key_prefix='row_',flatten_rows=False) # Format of result flattened_data = {'row_0_alpha': 'A', 'row_0_value': 1, 'row_1_alpha': 'B', 'row_1_value': 2, 'row_2_alpha': 'C', 'row_2_value': 3}
- freeze_index(column_name: str = None) None
Freeze the current index as a new column, expanding it into the current model. The new column is unaffected by any future changes to the primary index column.
- Parameters:
column_name (str, optional) – The name for the new frozen index column. If not provided, a default name ‘frzn_id’ will be used.
- Raises:
TypeError – If the provided
column_name
is not of type ‘str’.- Returns:
None
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01'), ('Sarah', 'West', 39, 0.7, '2023-10-01'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18') ] # Create the model sdm = SQLDataModel(data, headers) # Freeze index as new column 'id' sdm.freeze_index("id") # View model print(sdm)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┬──────┐ │ │ first │ last │ age │ service │ hire_date │ id │ ├───┼───────┼─────────┼──────┼─────────┼────────────┼──────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ 0 │ │ 1 │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ 1 │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ 2 │ │ 3 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ 3 │ │ 4 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ 4 │ └───┴───────┴─────────┴──────┴─────────┴────────────┴──────┘ [5 rows x 6 columns]
Note
Freezing the index will assign the current
SQLDataModel.sql_idx
for each row as a new column, leaving the current index in place.To modify the actual
SQLDataModel.sql_idx
value, use theSQLDataModel.reset_index()
method instead.
- classmethod from_csv(csv_source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiter: str = ',', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel
Returns a new
SQLDataModel
generated from the provided CSV source, which can be either a file path or a raw delimited string.- Parameters:
csv_source (str) – The path to the CSV file or a raw delimited string.
infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.
encoding (str, optional) – The encoding used to decode the CSV source if it is a file. Default is ‘Latin1’.
delimiter (str, optional) – The delimiter to use when parsing CSV source. Default is
,
.quotechar (str, optional) – The character used for quoting fields. Default is
"
.headers (List[str], optional) – List of column headers. If None, the first row of the CSV source is assumed to contain headers.
**kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided CSV source.
- Return type:
SQLDataModel
- Raises:
ValueError – If no delimited data is found in
csv_source
or if parsing with delimiter does not yield valid tabular data.Exception – If an error occurs while attempting to read from or process the provided CSV source.
Examples:
From CSV File
from SQLDataModel import SQLDataModel # CSV file path or raw CSV string csv_source = "/path/to/data.csv" # Create the model using the CSV file, providing custom headers sdm = SQLDataModel.from_csv(csv_source, headers=['ID', 'Name', 'Value'])
From CSV Literal
from SQLDataModel import SQLDataModel # CSV data data = ''' A, B, C 1a, 1b, 1c 2a, 2b, 2c 3a, 3b, 3c ''' # Create the model sdm = SQLDataModel.from_csv(data) # View result print(sdm)
This will output:
┌──────┬──────┬──────┐ │ A │ B │ C │ ├──────┼──────┼──────┤ │ 1a │ 1b │ 1c │ │ 2a │ 2b │ 2c │ │ 3a │ 3b │ 3c │ └──────┴──────┴──────┘ [3 rows x 3 columns]
- Change Log:
- Version 0.4.0 (2024-04-23):
Modifed to only parse CSV files and removed all delimiter sniffing with introduction of new method
SQLDataModel.from_delimited()
to handle other delimiters.Renamed
delimiters
parameter todelimiter
with,
set as new default to reflect revised focus on CSV files only.
Note
If
csv_source
is delimited by characters other than those specified, useSQLDataModel.from_delimited()
and provide delimiter todelimiters
.If
headers
are provided, the first row parsed from source will be the first row in the table and not discarded.- The
infer_types
argument can be used to infer the appropriate data type for each column: If
infer_types = True
, a random subset of the data will be used to infer the correct type and cast values accordinglyIf
infer_types = False
, values from the first row only will be used to assign types, almost always ‘str’ when reading from CSV.
- The
- classmethod from_data(data: Any = None, **kwargs) SQLDataModel
Convenience method to infer the source of
data
and return the appropriate constructor method to generate a newSQLDataModel
instance.- Parameters:
data (Any, required) – The input data from which to create the SQLDataModel object.
**kwargs – Additional keyword arguments to be passed to the constructor method, see init method for arguments.
- Constructor methods are called according to the input type:
dict
: If all values are python datatypes, passed asdtypes
to constructor, otherwise asdata
toSQLDataModel.from_dict()
.list
: If single dimension, passed asheaders
to constructor, otherwise asdata
containing list of lists.tuple
: Same as with list, if single dimension passed asheaders
, otherwise asdata
containing tuple of lists.numpy.ndarray
: passed toSQLDataModel.from_numpy()
as array data.pandas.DataFrame
: passed toSQLDataModel.from_pandas()
as dataframe data.polars.DataFrame
: passed toSQLDataModel.from_polars()
as dataframe data.str
: If starts with ‘http’, passed toSQLDataModel.from_html()
as url, otherwise:'.csv'
: passed toSQLDataModel.from_csv()
as csv source data.'.html'
: passed toSQLDataModel.from_html()
as html source data.'.json'
: passed toSQLDataModel.from_json()
as json source data.'.md'
: passed toSQLDataModel.from_markdown()
as markdown source data.'.parquet'
: passed toSQLDataModel.from_parquet()
as parquet source data.'.pkl'
: passed toSQLDataModel.from_pickle()
as pickle source data.'.sdm'
: passed toSQLDataModel.from_pickle()
as pickle source data.'.tex'
: passed toSQLDataModel.from_latex()
as latex source data.'.tsv'
: passed toSQLDataModel.from_csv()
as csv source data.'.txt'
: passed toSQLDataModel.from_text()
as text source data.'.xlsx'
: passed toSQLDataModel.from_excel()
as excel source data.
- Returns:
The SQLDataModel object created from the provided data.
- Return type:
SQLDataModel
- Raises:
TypeError – If the type of
data
is not supported.ValueError – If the file extension is not found, unsupported, or if the SQL extension is not supported.
Exception – If an OS related error occurs during file read operations if
data
is a filepath.
Example:
from SQLDataModel import SQLDataModel # Create SQLDataModel from a CSV file sdm_csv = SQLDataModel.from_data("data.csv", headers=['ID', 'Name', 'Value']) # Create SQLDataModel from a dictionary sdm_dict = SQLDataModel.from_data({"ID": int, "Name": str, "Value": float}) # Create SQLDataModel from a list of tuples sdm_list = SQLDataModel.from_data([(1, 'Alice', 100.0), (2, 'Bob', 200.0)], headers=['ID', 'Name', 'Value']) # Create SQLDataModel from raw string literal delimited_literal = ''' A, B, C 1, 2, 3 4, 5, 6 7, 8, 9 ''' # Create the model by having correct constructor inferred sdm = SQLDataModel.from_data(delimited_literal) # View output print(sdm)
This will output:
┌────┬────┬────┐ │ A │ B │ C │ ├────┼────┼────┤ │ 1 │ 2 │ 3 │ │ 4 │ 5 │ 6 │ │ 7 │ 8 │ 9 │ └────┴────┴────┘ [3 rows x 3 columns]
Note
This method attempts to infer the correct method to call based on
data
argument, if one cannot be inferred an exception is raised.For data type specific implementation or examples, see related method for appropriate data type.
- classmethod from_delimited(source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiters: str = ', \t;|:', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel
Returns a new
SQLDataModel
generated from the provided delimited source, which can be either a file path or a raw delimited string.- Parameters:
source (str) – The path to the delimited file or a raw delimited string.
infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.
encoding (str, optional) – The encoding used to decode the source if it is a file. Default is
'Latin1'
.delimiters (str, optional) – Possible delimiters. Default is
\s
,\t
,;
,|
,:
or,
(space, tab, semicolon, pipe, colon or comma).quotechar (str, optional) – The character used for quoting fields. Default is
"
.headers (list[str], optional) – List of column headers. If None, the first row of the delimited source is assumed to be the header row.
**kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided CSV source.
- Return type:
SQLDataModel
- Raises:
ValueError – If no delimiter is found in
source
or if parsing with delimiter does not yield valid tabular data.Exception – If an error occurs while attempting to read from or process the provided CSV source.
Example:
From Delimited Literal
from SQLDataModel import SQLDataModel # Space delimited literal source_data = ''' Name Age Height Beth 27 172.4 Kate 28 162.0 John 30 175.3 Will 35 185.8''' # Create the model sdm = SQLDataModel.from_delimited(source_data) # View output print(sdm)
This will output:
┌──────┬─────┬─────────┐ │ Name │ Age │ Height │ ├──────┼─────┼─────────┤ │ Beth │ 27 │ 172.40 │ │ Kate │ 28 │ 162.00 │ │ John │ 30 │ 175.30 │ │ Will │ 35 │ 185.80 │ └──────┴─────┴─────────┘ [4 rows x 3 columns]
From Delimited File
from SQLDataModel import SQLDataModel # Tab separated file tsv_file = 'persons.tsv' # Create the model sdm = SQLDataModel.from_delimited(tsv_file)
Note
Use
SQLDataModel.from_csv()
if delimiter in source is already known and available as this method requires more compute to determine a plausible delimiter.Use
SQLDataModel.from_text()
if data is not delimited but is a string representation such as an ASCII table or the output from anotherSQLDataModel
instance.If file is delimited by delimiters other than the default targets
\s
,\t
,;
,|
,:
or,
(space, tab, semicolon, pipe, colon or comma) make sure they are provided as single character values todelimiters
.
- classmethod from_dict(data: dict | list, **kwargs) SQLDataModel
Create a new
SQLDataModel
instance from the provided dictionary.- Parameters:
data (dict) – The dictionary or list of dictionaries to convert to SQLDataModel. If keys are of type int, they will be used as row indexes; otherwise, keys will be used as headers.
**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided dictionary.
- Return type:
SQLDataModel
- Raises:
TypeError – If the provided dictionary values are not of type ‘list’, ‘tuple’, or ‘dict’.
ValueError – If the provided data appears to be a list of dicts but is empty.
Example:
from SQLDataModel import SQLDataModel # Sample data with column orientation data = { 'Name': ['Beth', 'John', 'Alice', 'Travis'], 'Height': [172.4, 175.3, 162.0, 185.8], 'Age': [27, 30, 28, 35] } # Create the model sdm = SQLDataModel.from_dict(data) # View it print(sdm)
This will output:
┌────────┬─────────┬─────┐ │ Name │ Height │ Age │ ├────────┼─────────┼─────┤ │ Beth │ 172.40 │ 27 │ │ John │ 175.30 │ 30 │ │ Alice │ 162.00 │ 28 │ │ Travis │ 185.80 │ 35 │ └────────┴─────────┴─────┘ [4 rows x 3 columns]
We can also create a model using a dictionary with row orientation:
from SQLDataModel import SQLDataModel # Sample data with row orientation data = { 0: ['Mercury', 0.38] ,1: ['Venus', 0.91] ,2: ['Earth', 1.00] ,3: ['Mars', 0.38] } # Create the model with custom headers sdm = SQLDataModel.from_dict(data, headers=['Planet', 'Gravity']) # View output print(sdm)
This will output the model created using row-wise dictionary data:
┌─────────┬─────────┐ │ Planet │ Gravity │ ├─────────┼─────────┤ │ Mercury │ 0.38 │ │ Venus │ 0.91 │ │ Earth │ 1.00 │ │ Mars │ 0.38 │ └─────────┴─────────┘ [4 rows x 2 columns]
- Change Log:
- Version 0.6.3 (2024-05-16):
Modified to try parsing input data as JSON if initial inspection does not signify row or column orientation.
Note
If data orientation suggests JSON like structure, then
SQLDataModel.from_json()
will attempt to construct the model.Dictionaries in list like orientation can also be used with structures similar to JSON objects.
The method determines the structure of the SQLDataModel based on the format of the provided dictionary.
If the keys are integers, they are used as row indexes; otherwise, keys are used as headers.
See
SQLDataModel.to_dict()
for converting existing instances ofSQLDataModel
to dictionaries.
- classmethod from_excel(filename: str, worksheet: int | str = 0, min_row: int | None = None, max_row: int | None = None, min_col: int | None = None, max_col: int | None = None, headers: list[str] = None, **kwargs) SQLDataModel
Returns a new
SQLDataModel
instance from the specified Excel file.- Parameters:
filename (str) – The file path to the Excel file, e.g.,
filename = 'titanic.xlsx'
.worksheet (int | str, optional) – The index or name of the worksheet to read from. Defaults to 0, indicating the first worksheet.
min_row (int | None, optional) – The minimum row number to start reading data from. Defaults to None, indicating the first row.
max_row (int | None, optional) – Maximum row index (1-based) to import. Defaults to None, indicating all rows are read.
min_col (int | None, optional) – Minimum column index (1-based) to import. Defaults to None, indicating the first column.
max_col (int | None, optional) – Maximum column index (1-based) to import. Defaults to None, indicating all the columns are read.
headers (List[str], optional) – The column headers for the data. Default is None, using the first row of the Excel sheet as headers.
**kwargs – Additional keyword arguments to pass to the
SQLDataModel
constructor.
- Raises:
ModuleNotFoundError – If the required package
openpyxl
is not installed as determined by_has_xl
flag.TypeError – If the
filename
argument is not of type ‘str’ representing a valid Excel file path.Exception – If an error occurs during Excel read and write operations related to openpyxl processing.
- Returns:
A new instance of
SQLDataModel
created from the Excel file.- Return type:
SQLDataModel
Examples:
We’ll use this Excel file,
data.xlsx
, as the source for the below examples:┌───────┬─────┬────────┬───────────┐ │ A │ B │ C │ D │ ┌───┼───────┼─────┼────────┼───────────┤ │ 1 │ Name │ Age │ Gender │ City │ │ 2 │ John │ 25 │ Male │ Boston │ │ 3 │ Alice │ 30 │ Female │ Milwaukee │ │ 4 │ Bob │ 22 │ Male │ Chicago │ │ 5 │ Sarah │ 35 │ Female │ Houston │ │ 6 │ Mike │ 28 │ Male │ Atlanta │ └───┴───────┴─────┴────────┴───────────┘ [ Sheet1 ]
Example 1: Load Excel file with default parameters
from SQLDataModel import SQLDataModel # Create the model using the default parameters sdm = SQLDataModel.from_excel('data.xlsx') # View imported data print(sdm)
This will output all of the data starting from ‘A1’:
┌───────┬──────┬────────┬───────────┐ │ Name │ Age │ Gender │ City │ ├───────┼──────┼────────┼───────────┤ │ John │ 25 │ Male │ Boston │ │ Alice │ 30 │ Female │ Milwaukee │ │ Bob │ 22 │ Male │ Chicago │ │ Sarah │ 35 │ Female │ Houston │ │ Mike │ 28 │ Male │ Atlanta │ └───────┴──────┴────────┴───────────┘ [5 rows x 4 columns]
Example 2: Load Excel file from specific worksheet
from SQLDataModel import SQLDataModel # Create the model from 'Sheet2' sdm = SQLDataModel.from_excel('data.xlsx', worksheet='Sheet2') # View imported data print(sdm)
This will output the contents of ‘Sheet2’:
┌────────┬───────┐ │ Gender │ count │ ├────────┼───────┤ │ Male │ 3 │ │ Female │ 2 │ └────────┴───────┘ [2 rows x 2 columns]
Example 3: Load Excel file with custom headers starting from different row
from SQLDataModel import SQLDataModel # Use our own headers instead of the Excel ones new_cols = ['Col A', 'Col B', 'Col C', 'Col D'] # Create the model starting from the 2nd row to ignore the original headers sdm = SQLDataModel.from_excel('data.xlsx', min_row=2, headers=new_cols) # View the data print(sdm)
This will output the data with our renamed headers:
┌───────┬───────┬────────┬───────────┐ │ Col A │ Col B │ Col C │ Col D │ ├───────┼───────┼────────┼───────────┤ │ John │ 25 │ Male │ Boston │ │ Alice │ 30 │ Female │ Milwaukee │ │ Bob │ 22 │ Male │ Chicago │ │ Sarah │ 35 │ Female │ Houston │ │ Mike │ 28 │ Male │ Atlanta │ └───────┴───────┴────────┴───────────┘ [5 rows x 4 columns]
Example 4: Load Excel file with specific subset of columns
from SQLDataModel import SQLDataModel # Create the model using the middle two columns only sdm = SQLDataModel.from_excel('data.xlsx', min_col=2, max_col=3) # View the data print(sdm)
This will output only the middle two columns:
┌──────┬────────┐ │ Age │ Gender │ ├──────┼────────┤ │ 25 │ Male │ │ 30 │ Female │ │ 22 │ Male │ │ 35 │ Female │ │ 28 │ Male │ └──────┴────────┘ [5 rows x 2 columns]
Note
This method entirely relies on
openpyxl
, see their amazing documentation for further information on Excel file handling in python.If custom
headers
are provided using the defaultmin_row
, then the original headers, if present, will be duplicated.All indicies for
min_row
,max_row
,min_col
andmax_col
are 1-based instead of 0-based, again seeopenpyxl
for more details.See related
SQLDataModel.to_excel()
for exporting an existingSQLDataModel
to Excel.
- classmethod from_html(html_source: str, encoding: str = 'utf-8', table_identifier: int | str = 0, infer_types: bool = True, **kwargs) SQLDataModel
Parses HTML table element from one of three possible sources: web page at url, local file at path, raw HTML string literal. If
table_identifier
is not specified, the first <table> element successfully parsed is returned, otherwise iftable_identifier
is astr
, the parser will return the corresponding ‘id’ or ‘name’ HTML attribute that matches the identifier specified. Iftable_identifier
is anint
, the parser will return the table matched as a sequential index after parsing all <table> elements from the top of the page down, starting at ‘0’. By default, the first <table> element found is returned iftable_identifier
is not specified.- Parameters:
html_source (str) – The HTML source, which can be a URL, a valid path to an HTML file, or a raw HTML string. If starts with ‘http’, the argument is considered a url and the table will be parsed from returned the web request. If is a valid file path, the argument is considered a local file and the table will be parsed from its html. If is not a valid url or path, the argument is considered a raw HTML string and the table will be parsed directly from the input.
encoding (str) – The encoding to use for reading HTML when
html_source
is considered a valid url or file path (default is ‘utf-8’).table_identifier (int | str) – An identifier to specify which table to parse if there are multiple tables in the HTML source (default is 0).
infer_types (bool, optional) – If column data types should be inferred in the return model. Default is True, meaning column types will be inferred otherwise are returned as ‘str’ types. If is
int
, identifier is treated as the indexed location of the <table> element on the page from top to bottom starting from zero and will return the corresponding position when encountered. If isstr
, identifier is treated as a target HTML ‘id’ or ‘name’ attribute to search for and will return the first case-insensitive match when encountered.**kwargs – Additional keyword arguments to pass when using
urllib.request.urlopen
to fetch HTML from a URL.
- Returns:
A new SQLDataModel instance containing the data from the parsed HTML table.
- Return type:
SQLDataModel
- Raises:
TypeError – If
html_source
is not of typestr
representing a possible url, filepath or raw HTML stream.HTTPError – Raised from
urllib
whenhtml_source
is considered a url and an HTTP exception occurs.URLError – Raised from
urllib
whenhtml_source
is considered a url and a URL exception occurs.ValueError – If no <table> elements are found or if the targeted
table_identifier
is not found.OSError – Related exceptions that may be raised when
html_source
is considered a file path.
Examples:
From Website URL
from SQLDataModel import SQLDataModel # From URL url = 'https://en.wikipedia.org/wiki/1998_FIFA_World_Cup' # Lets get the 94th table from the 1998 World Cup sdm = SQLDataModel.from_html(url, table_identifier=94) # View result: print(sdm)
This will output:
┌────┬─────────────┬────┬────┬────┬────┬────┬────┬────┬─────┬──────┐ │ R │ Team │ G │ P │ W │ D │ L │ GF │ GA │ GD │ Pts. │ ├────┼─────────────┼────┼────┼────┼────┼────┼────┼────┼─────┼──────┤ │ 1 │ France │ C │ 7 │ 6 │ 1 │ 0 │ 15 │ 2 │ +13 │ 19 │ │ 2 │ Brazil │ A │ 7 │ 4 │ 1 │ 2 │ 14 │ 10 │ +4 │ 13 │ │ 3 │ Croatia │ H │ 7 │ 5 │ 0 │ 2 │ 11 │ 5 │ +6 │ 15 │ │ 4 │ Netherlands │ E │ 7 │ 3 │ 3 │ 1 │ 13 │ 7 │ +6 │ 12 │ │ 5 │ Italy │ B │ 5 │ 3 │ 2 │ 0 │ 8 │ 3 │ +5 │ 11 │ │ 6 │ Argentina │ H │ 5 │ 3 │ 1 │ 1 │ 10 │ 4 │ +6 │ 10 │ │ 7 │ Germany │ F │ 5 │ 3 │ 1 │ 1 │ 8 │ 6 │ +2 │ 10 │ │ 8 │ Denmark │ C │ 5 │ 2 │ 1 │ 2 │ 9 │ 7 │ +2 │ 7 │ └────┴─────────────┴────┴────┴────┴────┴────┴────┴────┴─────┴──────┘ [8 rows x 11 columns]
From Local File
from SQLDataModel import SQLDataModel # From HTML file sdm = SQLDataModel.from_html('path/to/file.html') # View output print(sdm)
This will output:
┌─────────────┬────────┬──────┐ │ Team │ Points │ Rank │ ├─────────────┼────────┼──────┤ │ Brazil │ 63.7 │ 1 │ │ England │ 50.7 │ 2 │ │ Spain │ 50.0 │ 3 │ │ Germany [a] │ 49.3 │ 4 │ │ Mexico │ 47.3 │ 5 │ │ France │ 46.0 │ 6 │ │ Italy │ 44.3 │ 7 │ │ Argentina │ 44.0 │ 8 │ └─────────────┴────────┴──────┘ [8 rows x 3 columns]
From Raw HTML
from SQLDataModel import SQLDataModel # Raw HTML raw_html = '''<table id="find-me"> <tr> <th>Col 1</th> <th>Col 2</th> </tr> <tr> <td>A</td> <td>1</td> </tr> <tr> <td>B</td> <td>2</td> </tr> <tr> <td>C</td> <td>3</td> </tr> </table>''' # Create the model and search for id attribute sdm = SQLDataModel.from_html(raw_html, table_identifier="find-me") # View output print(sdm)
This will output:
┌───┬───────┬───────┐ │ │ Col 1 │ Col 2 │ ├───┼───────┼───────┤ │ 1 │ B │ 2 │ │ 2 │ C │ 3 │ └───┴───────┴───────┘ [3 rows x 2 columns]
Note
**kwargs
passed to method are used inurllib.request.urlopen
ifhtml_source
is being considered as a web url.**kwargs
passed to method are used inopen
ifhtml_source
is being considered as a filepath.The largest row size encountered will be used as the
column_count
for the returnedSQLDataModel
, rows will be padded withNone
if less.See
SQLDataModel.generate_html_table_chunks()
for initial source chunking before content fed toSQLDataModel.HTMLParser
.
- classmethod from_json(json_source: str | list | dict, encoding: str = 'utf-8', **kwargs) SQLDataModel
Creates a new
SQLDataModel
instance from JSON file path or JSON-like source, flattening if required.- Parameters:
json_source (str | list | dict) – The JSON source. If a string, it can represent a file path or a JSON-like object.
encoding (str) – The encoding to use when reading from a file. Defaults to ‘utf-8’.
**kwargs – Additional keyword arguments to pass to the
SQLDataModel
constructor.
- Returns:
A new SQLDataModel instance created from the JSON source.
- Return type:
SQLDataModel
- Raises:
TypeError – If the
json_source
argument is not of type ‘str’, ‘list’, or ‘dict’.OSError – If related exception occurs when trying to open and read from
json_source
as file path.
Examples:
From JSON String Literal
from SQLDataModel import SQLDataModel # Sample JSON string json_data = '''[{ "id": 1, "color": "red", "value": "#f00" }, { "id": 2, "color": "green", "value": "#0f0" }, { "id": 3, "color": "blue", "value": "#00f" }]''' # Create the model sdm = SQLDataModel.from_json(json_data) # View result print(sdm)
This will output:
┌──────┬───────┬───────┐ │ id │ color │ value │ ├──────┼───────┼───────┤ │ 1 │ red │ #f00 │ │ 2 │ green │ #0f0 │ │ 3 │ blue │ #00f │ └──────┴───────┴───────┘ [3 rows x 3 columns]
From JSON-like Object
from SQLDataModel import SQLDataModel # JSON-like sample json_data = [{ "alpha": "A", "value": "1" }, { "alpha": "B", "value": "2" }, { "alpha": "C", "value": "3" }] # Create the model sdm = SQLDataModel.from_json(json_data) # Output print(sdm)
This will output:
┌───────┬───────┐ │ alpha │ value │ ├───────┼───────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ └───────┴───────┘ [3 rows x 2 columns]
From JSON file
from SQLDataModel import SQLDataModel # JSON file path json_data = 'data/json-sample.json' # Create the model sdm = SQLDataModel.from_json(json_data, encoding='latin-1') # View output print(sdm)
This will output:
┌──────┬────────┬───────┬─────────┐ │ id │ color │ value │ notes │ ├──────┼────────┼───────┼─────────┤ │ 1 │ red │ #f00 │ primary │ │ 2 │ green │ #0f0 │ │ │ 3 │ blue │ #00f │ primary │ │ 4 │ cyan │ #0ff │ │ │ 5 │ yellow │ #ff0 │ │ │ 5 │ black │ #000 │ │ └──────┴────────┴───────┴─────────┘ [6 rows x 4 columns]
Note
If
json_source
is deeply-nested it will be flattened according to the staticmethodSQLDataModel.flatten_json()
If
json_source
is a JSON-like string object that is not an array, it will be wrapped according as an array.
- classmethod from_latex(latex_source: str, table_identifier: int = 1, encoding: str = 'utf-8', **kwargs) SQLDataModel
Creates a new
SQLDataModel
instance from the provided LaTeX file or raw literal.- Parameters:
latex_source (str) – The LaTeX source containing one or more LaTeX tables. If
latex_source
is a valid system filepath, source will be treated as a.tex
file and parsed. Iflatex_source
is not a valid filepath, source will be parsed as raw LaTeX literal.table_identifier (int, optional) – The index position of the LaTeX table to extract. Default is 1.
encoding (str, optional) – The file encoding to use if source is a LaTex filepath. Default is ‘utf-8’;.
**kwargs – Additional keyword arguments to be passed to the
SQLDataModel
constructor.
- Returns:
The
SQLDataModel
instance created from the parsed LaTeX table.- Return type:
SQLDataModel
- Raises:
TypeError – If the
latex_source
argument is not of type ‘str’, or if thetable_identifier
argument is not of type ‘int’.ValueError – If the
table_identifier
argument is less than 1, or if no tables are found in the LaTeX source.IndexError – If the
table_identifier
is greater than the number of tables found in the LaTeX source.
- Table Indicies:
In the last example,
sdm
will contain the data from the second table found in the LaTeX content.Tables are indexed starting from index 1 at the top of the LaTeX content, incremented as they are found.
LaTeX parsing stops after the table specified at
table_identifier
is found without parsing the remaining content.
Examples:
From LaTeX literal
from SQLDataModel import SQLDataModel # Raw LaTeX literal latex_content = ''' \begin{tabular}{|l|r|r|} \hline {Name} & {Age} & {Height} \\ \hline John & 30 & 175.30 \\ Alice & 28 & 162.00 \\ Michael & 35 & 185.80 \\ \hline \end{tabular} ''' # Create the model from the LaTeX sdm = SQLDataModel.from_latex(latex_content) # View result print(sdm)
This will output:
┌─────────┬──────┬─────────┐ │ Name │ Age │ Height │ ├─────────┼──────┼─────────┤ │ John │ 30 │ 175.30 │ │ Alice │ 28 │ 162.00 │ │ Michael │ 35 │ 185.80 │ └─────────┴──────┴─────────┘ [3 rows x 3 columns]
From LaTeX file
from SQLDataModel import SQLDataModel # Load LaTeX content from file latex_file = 'path/to/latex/file.tex' # Create the model using the path sdm = SQLDataModel.from_latex(latex_file)
Specifying table identifier
from SQLDataModel import SQLDataModel # Raw LaTeX literal with multiple tables latex_content = ''' %% LaTeX with a Table \begin{tabular}{|l|l|} \hline {Header A} & {Header B} \\ \hline Value A1 & Value B1 \\ Value A2 & Value B2 \\ \hline \end{tabular} %% Then another Table \begin{tabular}{|l|l|} \hline {Header X} & {Header Y} \\ \hline Value X1 & Value Y1 \\ Value X2 & Value Y2 \\ \hline \end{tabular} ''' # Create the model from the 2nd table sdm = SQLDataModel.from_latex(latex_content, table_identifier=2) # View output print(sdm)
This will output:
┌──────────┬──────────┐ │ Header X │ Header Y │ ├──────────┼──────────┤ │ Value X1 │ Value Y1 │ │ Value X2 │ Value Y2 │ └──────────┴──────────┘ [2 rows x 2 columns]
Note
LaTeX tables are identified based on the presence of tabular environments:
\begin{tabular}...\end{tabular}
.The
table_identifier
specifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.The provided
kwargs
are passed to theSQLDataModel
constructor for additional parameters to the instance returned.
- classmethod from_markdown(markdown_source: str, table_identifier: int = 1, **kwargs) SQLDataModel
Creates a new
SQLDataModel
instance from the provided Markdown source file or raw content.If
markdown_source
is a valid system path, the markdown file will be parsed. Otherwise, the provided string will be parsed as raw markdown.- Parameters:
markdown_source (str) – The Markdown source file path or raw content.
table_identifier (int, optional) – The index position of the markdown table to extract. Default is 1.
**kwargs – Additional keyword arguments to be passed to the
SQLDataModel
constructor.
- Raises:
TypeError – If the
markdown_source
argument is not of type ‘str’, or if thetable_identifier
argument is not of type ‘int’.ValueError – If the
table_identifier
argument is less than 1, or if no tables are found in the markdown source.IndexError – If the
table_identifier
is greater than the number of tables found in the markdown source.
- Returns:
The SQLDataModel instance created from the parsed markdown table.
- Return type:
SQLDataModel
- Table indicies:
In the last example,
sdm
will contain the data from the second table found in the markdown content.Tables are indexed starting from index 1 at the top of the markdown content, incremented as they are found.
Markdown parsing stops after the table specified at
table_identifier
is found without parsing the remaining content.
Examples:
From Markdown Literal
from SQLDataModel import SQLDataModel # Raw markdown literal markdown_content = ''' | Item | Price | # In stock | |---------------|-------|------------| | Juicy Apples | 1.99 | 37 | | Bananas | 1.29 | 52 | | Pineapple | 3.15 | 14 | ''' # Create the model from the markdown sdm = SQLDataModel.from_markdown(markdown_content) # View result print(sdm)
This will output:
┌──────────────┬───────┬────────────┐ │ Item │ Price │ # In stock │ ├──────────────┼───────┼────────────┤ │ Juicy Apples │ 1.99 │ 37 │ │ Bananas │ 1.29 │ 52 │ │ Pineapple │ 3.15 │ 14 │ └──────────────┴───────┴────────────┘ [3 rows x 3 columns]
From Markdown File
from SQLDataModel import SQLDataModel # Load markdown content from file markdown_file_path = 'path/to/markdown_file.md' # Create the model using the path sdm = SQLDataModel.from_markdown(markdown_file_path)
Specifying Table Identifier
from SQLDataModel import SQLDataModel # Raw markdown literal with multiple tables markdown_content = ''' ### Markdown with a Table | Header A | Header B | |----------|----------| | Value A1 | Value B1 | | Value A2 | Value B2 | ### Then another Table | Header X | Header Y | |----------|----------| | Value X1 | Value Y1 | | Value X2 | Value Y2 | ''' # Create the model from the 2nd table sdm = SQLDataModel.from_markdown(markdown_content, table_identifier=2) # View output print(sdm)
This will output:
┌──────────┬──────────┐ │ Header X │ Header Y │ ├──────────┼──────────┤ │ Value X1 │ Value Y1 │ │ Value X2 │ Value Y2 │ └──────────┴──────────┘ [2 rows x 2 columns]
Note
Markdown tables are identified based on the presence of pipe characters
|
defining table cells.The
table_identifier
specifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.Escaped pipe characters
\|
within the markdown are replaced with the HTML entity reference|
for proper parsing.The provided
kwargs
are passed to theSQLDataModel
constructor for additional parameters to the instance returned.
- classmethod from_numpy(array, headers: list[str] = None, **kwargs) SQLDataModel
Returns a
SQLDataModel
object created from the provided numpyarray
.- Parameters:
array (numpy.ndarray) – The numpy array to convert to a SQLDataModel.
headers (list of str, optional) – The list of headers to use for the SQLDataModel. If None, no headers will be used, and the data will be treated as an n-dimensional array. Default is None.
**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the numpy array.
- Return type:
SQLDataModel
- Raises:
ModuleNotFoundError – If the required package
numpy
is not found.TypeError – If
array
argument is not of typenumpy.ndarray
.DimensionError – If
array.ndim != 2
representing a (row, column) tabular array.
Example:
import numpy as np from SQLDataModel import SQLDataModel # Sample array arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Create the model with custom headers sdm = SQLDataModel.from_numpy(arr, headers=['Col A', 'Col B', 'Col C]) # View output print(sdm)
This will output:
┌───────┬───────┬───────┐ │ Col A │ Col B │ Col C │ ├───────┼───────┼───────┤ │ 1 │ 2 │ 3 │ │ 4 │ 5 │ 6 │ │ 7 │ 8 │ 9 │ └───────┴───────┴───────┘ [3 rows x 3 columns]
Note
Numpy array must have ‘2’ dimensions, the first representing the rows, and the second the columns.
If no headers are provided, default headers will be generated as ‘col_N’ where N represents the column integer index.
- classmethod from_pandas(df, headers: list[str] = None, **kwargs) SQLDataModel
Returns a
SQLDataModel
object created from the provideddf
representing a PandasDataFrame
object. Note thatpandas
must be installed in order to use this method.- Parameters:
df (pandas.DataFrame) – The pandas DataFrame to convert to a SQLDataModel.
headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the
df
object.**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the pandas DataFrame.
- Return type:
SQLDataModel
- Raises:
ModuleNotFoundError – If the required package
pandas
is not found.TypeError – If
df
argument is not of typepandas.DataFrame
.
Example:
import pandas as pd from SQLDataModel import SQLDataModel # Create a pandas DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) # Create the model sdm = SQLDataModel.from_pandas(df)
Note
If
headers
are not provided, the existing pandas columns will be used as the newSQLDataModel
headers.
- classmethod from_parquet(filename: str, **kwargs) SQLDataModel
Returns a new
SQLDataModel
instance from the specified parquet file.- Parameters:
filename (str) – The file path to the parquet file, e.g.,
filename = 'user/data/titanic.parquet'
.**kwargs – Additional keyword arguments to pass to the pyarrow
read_table
function, e.g.,filters = [('Name','=','Alice')]
.
- Returns:
A new instance of
SQLDataModel
created from the parquet file.- Return type:
SQLDataModel
- Raises:
ModuleNotFoundError – If the required package
pyarrow
is not installed as determined by_has_pa
flag.TypeError – If the
filename
argument is not of type ‘str’ representing a valid parquet file path.FileNotFoundError – If the specified parquet
filename
is not found.Exception – If any unexpected exception occurs during the file or parquet reading process.
Example:
from SQLDataModel import SQLDataModel # Sample parquet file pq_file = "titanic.parquet" # Create the model sdm = SQLDataModel.from_parquet(pq_file) # View column counts print(sdm.count())
This will output:
┌────┬─────────────┬──────┬────────┬───────┬───────┐ │ │ column │ na │ unique │ count │ total │ ├────┼─────────────┼──────┼────────┼───────┼───────┤ │ 0 │ PassengerId │ 0 │ 891 │ 891 │ 891 │ │ 1 │ Survived │ 0 │ 2 │ 891 │ 891 │ │ 2 │ Pclass │ 0 │ 3 │ 891 │ 891 │ │ 3 │ Name │ 0 │ 891 │ 891 │ 891 │ │ 4 │ Sex │ 0 │ 2 │ 891 │ 891 │ │ 5 │ Age │ 177 │ 88 │ 714 │ 891 │ │ 6 │ SibSp │ 0 │ 7 │ 891 │ 891 │ │ 7 │ Parch │ 0 │ 7 │ 891 │ 891 │ │ 8 │ Ticket │ 0 │ 681 │ 891 │ 891 │ │ 9 │ Fare │ 0 │ 248 │ 891 │ 891 │ │ 10 │ Cabin │ 687 │ 147 │ 204 │ 891 │ │ 11 │ Embarked │ 2 │ 3 │ 889 │ 891 │ └────┴─────────────┴──────┴────────┴───────┴───────┘ [12 rows x 5 columns]
Note
The pyarrow package is required to use this method as well as the
SQLDataModel.to_parquet()
method.Once the file is read into pyarrow.parquet, the
to_pydict()
method is used to pass the data to this package’sSQLDataModel.from_dict()
method.Titanic parquet data used in example available at https://www.kaggle.com/code/taruntiwarihp/titanic-dataset
- classmethod from_pickle(filename: str = None, **kwargs) SQLDataModel
Returns the
SQLDataModel
object from the providedfilename
. IfNone
, the current directory will be scanned for the defaultSQLDataModel.to_pickle()
format.- Parameters:
filename (str, optional) – The name of the pickle file to load. If None, the current directory will be scanned for the default filename. Default is None.
**kwargs – Additional arguments to be passed to the SQLDataModel constructor, these will override the properties loaded from
filename
.
- Returns:
The SQLDataModel object created from the loaded pickle file.
- Return type:
SQLDataModel
- Raises:
TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.
FileNotFoundError – If the provided filename could not be found or does not exist.
Example:
from SQLDataModel import SQLDataModel headers = ['Name','Age','Sex'] data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')] # Create the model with sample data sdm = SQLDataModel(data=data, headers=headers) # Filepath pkl_file = 'people.sdm' # Save the model sdm.to_pickle(filename=pkl_file) # Load it back from file sdm = SQLDataModel.from_pickle(filename=pkl_file)
Note
All data, headers, data types and display properties will be saved when pickling.
Any additional
kwargs
provided will override those saved in the pickled model.
- classmethod from_polars(df, headers: list[str] = None, **kwargs) SQLDataModel
Returns a
SQLDataModel
object created from the provideddf
representing a PolarsDataFrame
object. Note thatpolars
must be installed in order to use this method.- Parameters:
df (polars.DataFrame) – The Polars DataFrame to convert to a SQLDataModel.
headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the
df
object.**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the Polars DataFrame.
- Return type:
SQLDataModel
- Raises:
ModuleNotFoundError – If the required package
polars
is not found.TypeError – If
df
argument is not of typepolars.DataFrame
.
Example:
import polars as pl from SQLDataModel import SQLDataModel # Sample data data = { 'Name': ['Beth', 'John', 'Alice', 'Travis'], 'Age': [27, 30, 28, 35], 'Height': [172.4, 175.3, 162.0, 185.8] } # Create the polars DataFrame df = pl.DataFrame(data) # Create a SQLDataModel object sdm = SQLDataModel.from_polars(df) # View result print(sdm)
This will output a
SQLDataModel
constructed from the Polarsdf
:┌────────┬─────┬─────────┐ │ Name │ Age │ Height │ ├────────┼─────┼─────────┤ │ Beth │ 27 │ 172.40 │ │ John │ 30 │ 175.30 │ │ Alice │ 28 │ 162.00 │ │ Travis │ 35 │ 185.80 │ └────────┴─────┴─────────┘ [4 rows x 3 columns]
Note
If
headers
are not provided, the columns from the provided DataFrame’s columns will be used as the newSQLDataModel
headers.Polars uses different data types than those used by
SQLDataModel
, seeSQLDataModel.set_column_dtypes()
for specific casting rules.See related
SQLDataModel.to_polars()
for the inverse method of converting aSQLDataModel
into a PolarsDataFrame
object.
- classmethod from_pyarrow(table, **kwargs) SQLDataModel
Returns a new
SQLDataModel
instance from the provided Apache Arrow object.- Parameters:
table (pyarrow.lib.Table) – Apache Arrow object from which to construct a new
SQLDataModel
object.**kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.
- Raises:
ModuleNotFoundError – If the required package
pyarrow
is not installed.TypeError – If the provided table argument is not of type ‘pyarrow.lib.Table’.
- Returns:
A new SQLDataModel instance representing the data in the provided Apache Arrow object.
- Return type:
SQLDataModel
Example:
import pyarrow as pa from SQLDataModel import SQLDataModel # Sample data data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Grade': [3.8, 3.9, 3.2], } # Create PyArrow table from data table = pa.Table.from_pydict(data) # Create model from PyArrow table sdm = SQLDataModel.from_pyarrow(table)
This will output:
┌─────────┬──────┬───────┐ │ Name │ Age │ Grade │ ├─────────┼──────┼───────┤ │ Alice │ 25 │ 3.80 │ │ Bob │ 30 │ 3.90 │ │ Charlie │ 35 │ 3.20 │ └─────────┴──────┴───────┘ [3 rows x 3 columns]
Note
To convert an existing
SQLDataModel
instance to Apache Arrow format, seeSQLDataModel.to_pyarrow()
.This method is only for in-memory Apache Arrow table objects, for reading and writing parquet see
SQLDataModel.from_parquet()
.
- classmethod from_shape(shape: tuple[int, int], fill: Any = None, headers: list[str] = None, dtype: Literal['bytes', 'date', 'datetime', 'float', 'int', 'str'] = None, **kwargs) SQLDataModel
Returns a SQLDataModel from shape
(N rows, M columns)
as a convenience method to quickly build a model through an iterative approach. By default, no particular data type is assigned given the flexibility ofsqlite3
, however one can be inferred by providing an initialfill
value or explicitly by providing thedtype
argument.- Parameters:
shape (tuple[int, int]) – The shape to initialize the SQLDataModel with as
(M, N)
whereM
is the number of rows andN
is the number of columns.fill (Any, optional) – The scalar fill value to populate the new SQLDataModel with. Default is None, using SQL null values or deriving from
dype
if provided.headers (list[str], optional) – The headers to use for the model. Default is None, incrementing headers
0, 1, ..., N
whereN
is the number of columns.dtype (str, optional) – A valid python or SQL datatype to initialize the n-dimensional model with. Default is None, using the SQL text type.
**kwargs – Additional keyword arguments to pass to the
SQLDataModel
constructor.
- Raises:
TypeError – If
M
orN
are not of type ‘int’ representing a valid shape to initialize a SQLDataModel with.ValueError – If
M
orN
are not positive integer values representing valid nonzero row and column dimensions.ValueError – If
dtype
is not a valid python or SQL convertible datatype to initialize the model with.
- Returns:
Instance with the specified number of rows and columns, initialized with by
dtype
fill values or withNone
values (default).- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create a 3x3 model filled by 'X' sdm = SQLDataModel.from_shape((3,3), fill='X') # View it print(sdm)
This will output a 3x3 grid of ‘X’ characters:
┌───┬─────┬─────┬─────┐ │ │ 0 │ 1 │ 2 │ ├───┼─────┼─────┼─────┤ │ 0 │ X │ X │ X │ │ 1 │ X │ X │ X │ │ 2 │ X │ X │ X │ └───┴─────┴─────┴─────┘ [3 rows x 3 columns]
We can iteratively build the model from the shape dimensions:
from SQLDataModel import SQLDataModel # Define shape shape = (6,6) # Initialize the multiplcation table with integer dtypes mult_table = SQLDataModel.from_shape(shape=shape, dtype='int') # Construct the table values for x in range(shape[0]): for y in range(shape[1]): mult_table[x, y] = x * y # View the multiplcation table print(mult_table)
This will output our 6x6 multiplication table:
┌───┬─────┬─────┬─────┬─────┬─────┬─────┐ │ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ ├───┼─────┼─────┼─────┼─────┼─────┼─────┤ │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ │ 1 │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ │ 2 │ 0 │ 2 │ 4 │ 6 │ 8 │ 10 │ │ 3 │ 0 │ 3 │ 6 │ 9 │ 12 │ 15 │ │ 4 │ 0 │ 4 │ 8 │ 12 │ 16 │ 20 │ │ 5 │ 0 │ 5 │ 10 │ 15 │ 20 │ 25 │ └───┴─────┴─────┴─────┴─────┴─────┴─────┘ [6 rows x 6 columns]
- Change Log:
- Version 0.5.2 (2024-05-13):
Added
shape
parameter in lieu of separaten_rows
andn_cols
arguments.Added
fill
parameter to populate resulting SQLDataModel with values to override type-specific initialization defaults.Added
headers
parameter to explicitly set column names when creating the SQLDataModel.Added
**kwargs
parameter to align more closely with usage patterns of other model initializing constructor methods.
Note
If both
fill
anddtype
are provided, the data type will be derived fromtype(fill)
overriding or ignoring the specifieddtype
.If only
dtype
is provided, sensible default initialization fill values will be used to populate the model such as 0 or 0.0 for numeric and empty string or null for others.For those data types not natively implemented by
sqlite3
such asdate
anddatetime
, today’s date and now’s datetime will be used respectively for initialization values.
- classmethod from_sql(sql: str, con: Connection | Any, dtypes: dict = None, **kwargs) SQLDataModel
Create a
SQLDataModel
object by executing the provided SQL query using the specified SQL connection. If a single word is provided as thesql
, the method wraps it and executes a select all treating the text as the target table.- Supported Connection APIs:
SQLite using
sqlite3
PostgreSQL using
psycopg2
SQL Server ODBC using
pyodbc
Oracle using
cx_Oracle
Teradata using
teradatasql
- Parameters:
sql (str) – The SQL query to execute and use to create the SQLDataModel.
con (sqlite3.Connection | Any) – The database connection object, supported connection APIs are
sqlite3
,psycopg2
,pyodbc
,cx_Oracle
,teradatasql
dtypes (dict, optional) – A dictionary of the format
'column': 'python dtype'
to assign to values. Default is None, mapping types from source connection.**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the executed SQL query.
- Return type:
SQLDataModel
- Raises:
TypeError – If dtypes argument is provided and is not of type
dict
representing python data types to assign to values.SQLProgrammingError – If the provided SQL connection is not opened or valid, or the SQL query is invalid or malformed.
DimensionError – If the provided SQL query returns no data.
Examples:
From SQL Table
from SQLDataModel import SQLDataModel # Single word parameter sdm = SQLDataModel.from_sql("table_name", sqlite3.Connection) # Equilavent query executed sdm = SQLDataModel.from_sql("select * from table_name", sqlite3.Connection)
From SQLite Database
import sqlite3 from SQLDataModel import SQLDataModel # Create connection object sqlite_db_conn = sqlite3.connect('./database/users.db') # Basic usage with a select query sdm = SQLDataModel.from_sql("SELECT * FROM my_table", sqlite_db_conn) # When a single word is provided, it is treated as a table name for a select all query sdm_table = SQLDataModel.from_sql("my_table", sqlite_db_conn)
From PostgreSQL Database
import psycopg2 from SQLDataModel import SQLDataModel # Create connection object pg_db_conn = psycopg2.connect('dbname=users user=postgres password=postgres') # Basic usage with a select query sdm = SQLDataModel.from_sql("SELECT * FROM my_table", pg_db_conn) # When a single word is provided, it is treated as a table name for a select all query sdm_table = SQLDataModel.from_sql("my_table", pg_db_conn)
From SQL Server Databse
import pyodbc from SQLDataModel import SQLDataModel # Create connection object con = pyodbc.connect("DRIVER={SQL Server};SERVER=host;DATABASE=db;UID=user;PWD=pw;") # Basic usage with a select query sdm = SQLDataModel.from_sql("SELECT * FROM my_table", con) # When a single word is provided, it is treated as a table name for a select all query sdm_table = SQLDataModel.from_sql("my_table", con)
Note
Unsupported connection object will output a
SQLDataModelWarning
advising unstable or undefined behaviour.The
dtypes
, if provided, are only applied tosqlite3
connection objects as remaining supported connections implement SQL to python adapters.See related
SQLDataModel.to_sql()
for writing to SQL database connections.
- classmethod from_text(text_source: str, table_identifier: int = 1, encoding: str = 'utf-8', headers: list[str] = None, **kwargs) SQLDataModel
Returns a new
SQLDataModel
generated from the providedtext_source
, either as a file if the path exists, or from a raw string literal if the path does not exist.- Parameters:
text_source (str) – The path to the tabular data file or a raw string literal containing tabular data.
table_identifier (int, optional) – The index position of the target table within the text source. Default is 1.
encoding (str, optional) – The encoding used to decode the text source if it is a file. Default is ‘utf-8’.
headers (list, optional) – The headers to use for the provided data. Default is to use the first row.
**kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided tabular data.
- Return type:
SQLDataModel
- Raises:
TypeError – If
text_source
is not a string ortable_identifier
is not an integer.ValueError – If no tabular data is found in
text_source
, if parsing fails to extract valid tabular data, or if the providedtable_identifier
is out of range.IndexError – If the provided
table_identifier
exceeds the number of tables found intext_source
.Exception – If an error occurs while attempting to read from or process the provided
text_source
.
Example:
from SQLDataModel import SQLDataModel # Text source containing tabular data text_source = "/path/to/tabular_data.txt" # Create the model using the text source sdm = SQLDataModel.from_text(text_source, table_identifier=2)
Note
This method is made for parsing
SQLDataModel
formatted text, such as the kind generated withprint(sdm)
or the output created by the inverse methodSQLDataModel.to_text()
For parsing other delimited tabular data, this method calls the related
SQLDataModel.from_csv()
method, which parses tabular data constructed with common delimiters.
- generate_apply_function_stub() str
Generates a function template using the current
SQLDataModel
to format function arguments for theSQLDataModel.apply_function_to_column()
method.- Returns:
A string representing the function template.
- Return type:
str
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('data.csv') # Create the stub stub = sdm.generate_apply_function_stub() # View it print(stub)
This will output:
def func(user_name:str, user_age:int, user_salaray:float): # apply logic and return value return
Containing all the required inputs and column names needed to generate a compatible function to apply to the model and can be copy pasted into existing code.
Note
This method is to meant as a general informative tool or for debugging assistance if needed
See
SQLDataModel.apply()
method for usage and implementation of functions in SQLDataModel usingsqlite3
- static generate_html_table_chunks(html_source: str) Generator[str, None, None]
Generate chunks of HTML content for all
<table>
elements found in provided source as complete and unbroken chunks for parsing.- Parameters:
html_source (str) – The raw HTML content from which to generate chunks.
- Raises:
ValueError – If zero
<table>
elements were found inhtml_source
provided.- Yields:
str
– Chunks of HTML content containing complete<table>
elements.
Example:
from SQLDataModel import SQLDataModel # HTML content to chunk html_source = ''' <html> <table><tr><td>Table 1</td></tr></table> ... <p>Non-table elements</p> ... <table><tr><td>Table 2</td></tr></table> </html> ''' # Generate and view the returned table chunks for chunk in SQLDataModel.generate_html_table_chunks(html_source): print('Chunk:', chunk)
This will output:
Chunk: <table><tr><td>Table 1</td></tr></table> Chunk: <table><tr><td>Table 2</td></tr></table>
Note
HTML content before the first
<table>
element and after the last</table>
element is ignored and not yielded.See
SQLDataModel.from_html()
for full implementation and how this function is used for HTML parsing.
- get_column_alignment() str
Returns the current
column_alignment
property value,dynamic
by default.- Returns:
The current value of the
column_alignment
property.- Return type:
str
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get the current alignment value alignment = sdm.get_column_alignment() # Outputs 'dynamic' print(alignment)
Note
Use
SQLDataModel.set_column_alignment()
to modify column alignment.
- get_column_dtypes(columns: str | int | list = None, dtypes: Literal['python', 'sql'] = 'python') dict
Get the data types of specified columns as either Python or SQL datatypes as a
dict
in the format of{'column': 'dtype'}
.- Parameters:
columns (str | int | list) – The column or columns for which to retrieve data types. Defaults to all columns.
dtypes (Literal["python", "sql"]) – The format in which to retrieve data types. Defaults to “python”.
- Raises:
TypeError – If
columns
is not of typestr
,int
, orlist
.IndexError – If
columns
is of typeint
and the index is outside the valid range.ValueError – If a specified column in
columns
is not found in the current dataset. UseSQLDataModel.get_headers()
to view valid columns.
- Returns:
A dictionary mapping column names to their data types.
- Return type:
dict
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01'), ('Sarah', 'West', 39, 0.7, '2023-10-01'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18') ] # Create the model sdm = SQLDataModel(data, headers) # Get all column python dtypes sdm_dtypes = sdm.get_column_dtypes() # View dict items for col, dtype in sdm_dtypes.items(): print(f"{col}: {dtype}")
This will output:
first: str last: str age: int service: float hire_date: date
Get SQL data types as well:
# Get specific column sql dtypes sdm_dtypes = sdm.get_column_dtypes(columns=['first','age','service'], dtypes="sql") # View dict items for col, dtype in sdm_dtypes.items(): print(f"{col}: {dtype}")
This will output:
first: TEXT age: INTEGER service: REAL
Note
SQLDataModel index column is not included, only columns specified in the
SQLDataModel.headers
attribute are in scope.Only the dtypes are returned, any primary key references are removed to ensure compatability with external calls.
Python datatypes are returned in lower case, while SQL dtypes are returned in upper case to reflect convention.
See
SQLDataModel.dtypes
for direct mapping from column to data type returned as{'col': 'dtype'}
.
- get_display_float_precision() int
Retrieves the current float display precision used exclusively for representing the values of real numbers in the
repr
method for theSQLDataModel
. Default value is set to 4 decimal places of precision.- Returns:
The current float display precision.
- Return type:
int
Note
The float display precision is the number of decimal places to include when displaying real numbers in the string representation of the
SQLDataModel
.This value is utilized in the
repr
method to control the precision of real number values.The method does not affect the actual value of float dtypes in the underlying
SQLDataModel
- get_display_index() bool
Returns the current value set at
SQLDataModel.display_index
, which determines whether or not the index is displayed in theSQLDataModel
representation.- Returns:
The current value of the
display_index
property.- Return type:
bool
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get the current value for displaying the index display_index = sdm.get_display_index() # Output: True print(display_index)
Note
Use
SQLDataModel.set_display_index()
to modify this property and toggle index display visibility.
- get_display_max_rows() int | None
Retrieves the current value at
SQLDataModel.display_max_rows
, which determines the maximum rows displayed for theSQLDataModel
.- Returns:
The current value set at
SQLDataModel.display_max_rows
.- Return type:
int
orNone
Example:
from SQLDataModel import SQLDataModel # Create model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get current value display_max_rows = sdm.get_display_max_rows() # By default rows will be limited by current terminal height print(display_max_rows) # None
Note
This does not affect the actual number of rows in the model, only the maximum displayed.
Use
SQLDataModel.set_display_max_rows()
to explicitly set a max row limit instead of using terminal height.
- get_headers() list[str]
Returns the current
SQLDataModel
headers.- Returns:
A list of strings representing the headers.
- Return type:
list
Example:
from SQLDataModel import SQLDataModel # Create model sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Get current model headers headers = sdm.get_headers() # Display values print(headers) # outputs: ['First Name', 'Last Name', 'Salary']
- get_indicies() tuple
Returns the current valid row indicies for the
SQLDataModel
instance.- Returns:
A tuple of the current values for
SQLDataModel.sql_idx
in ascending order.- Return type:
tuple
Example:
from SQLDataModel import SQLDataModel headers = ['Name', 'Age', 'Height'] data = [('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8)] # Create the model sdm = SQLDataModel(data, headers) # Get current valid indicies valid_indicies = sdm.get_indicies() # View results print(valid_indicies)
This will output:
(0, 1, 2)
- Notes
Primary use is to confirm valid model indexing when starting index != 0 or filtering changes minimum/maximum indexes.
- get_max_column_width() int
Returns the current
max_column_width
property value.- Returns:
The current value of the
max_column_width
property.- Return type:
int
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get the current max column width value max_width = sdm.get_max_column_width() # Output print(max_width) # 32
- get_min_column_width() int
Returns the current
min_column_width
property value.- Returns:
The current value of the
min_column_width
property.- Return type:
int
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get and save the current value min_width = sdm.get_min_column_width() # Output print(min_width) # 6
- get_model_name() str
Returns the
SQLDataModel
table name currently being used by the model as an alias for any SQL queries executed by the user and internally.- Returns:
The current
SQLDataModel
table name set by value of attributeSQLDataModel.model_name
.- Return type:
str
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2']) # Get the current name model_name = sdm.get_model_name() # View it print(f'The model is currently using the table name: {model_name}')
- get_shape() tuple[int, int]
Returns the current shape of the
SQLDataModel
as a tuple of(rows x columns)
.- Returns:
A tuple representing the current dimensions of rows and columns in the
SQLDataModel
.- Return type:
tuple[int, int]
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel([[1,2,3], [4,5,6], [7,8,9]]) # Get the current shape shape = sdm.get_shape() # View it print("shape:", shape)
This will output:
shape: (3, 3)
The shape can also be seen when printing the model:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel([[1,2,3], [4,5,6], [7,8,9]]) # View it and the shape print(sdm, "<-- shape is also visible here")
This will output:
┌───┬───────┬───────┬───────┐ │ │ col_0 │ col_1 │ col_2 │ ├───┼───────┼───────┼───────┤ │ 0 │ 1 │ 2 │ 3 │ │ 1 │ 4 │ 5 │ 6 │ │ 2 │ 7 │ 8 │ 9 │ └───┴───────┴───────┴───────┘ [3 rows x 3 columns] <-- shape is also visible here
- Change Log:
- Version 0.3.6 (2024-04-09):
Returns the new
SQLDataModel.shape
directly, making this method redundant.
Note
If an empty model is initialized, the
SQLDataModel.row_count
will be 0 until the first row is inserted.Using the
SQLDataModel.__getitem__()
syntax ofsdm[row, col]
returns a new model instance with the corresponding shape.
- classmethod get_supported_sql_connections() tuple
Returns the currently tested DB API 2.0 dialects for use with
SQLDataModel.from_sql()
method.- Returns:
A tuple of supported DB API 2.0 dialects.
- Return type:
tuple
Example:
from SQLDataModel import SQLDataModel # Get supported dialects supported_dialects = SQLDataModel.get_supported_sql_connections() # View details print(supported_dialects) # Outputs supported_dialects = ('sqlite3', 'psycopg2', 'pyodbc', 'cx_oracle', 'teradatasql')
- group_by(columns: str | list[str], order_by_count: bool = True) SQLDataModel
Returns a new
SQLDataModel
after performing a group by operation on specified columns.- Parameters:
columns (str, list, tuple) – Columns to group by. Accepts either individual strings or a list/tuple of strings.
order_by_count (bool, optional) – If True (default), orders the result by count. If False, orders by the specified columns.
- Raises:
TypeError – If the columns argument is not of type str, list, or tuple.
ValueError – If any specified column does not exist in the current model.
SQLProgrammingError – If any specified columns or aggregate keywords are invalid or incompatible with the current model.
- Returns:
A new
SQLDataModel
instance containing the result of the group by operation.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Sarah', 'West', 39, 0.7, '2023-10-01', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female') ] # Create the model sdm = SQLDataModel(data, headers, display_float_precision=2, display_index=True) # Group by 'gender' column sdm_gender = sdm.group_by("gender") # View model print(sdm_gender)
This will output:
┌───┬────────┬───────┐ │ │ gender │ count │ ├───┼────────┼───────┤ │ 0 │ Male │ 3 │ │ 1 │ Female │ 2 │ └───┴────────┴───────┘ [2 rows x 2 columns]
Multiple columns can also be used to group by:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('data.csv') # Group by multiple columns sdm.group_by(["country", "state", "city"])
Note
Use
order_by_count=False
to change ordering from count to column arguments provided.
- head(n_rows: int = 5) SQLDataModel
Returns the first
n_rows
of the currentSQLDataModel
.- Parameters:
n_rows (int, optional) – Number of rows to return. Defaults to 5.
- Raises:
TypeError – If
n_rows
argument is not of type ‘int’ representing the number of rows to return from the head of the model.- Returns:
A new
SQLDataModel
instance containing the specified number of rows.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Countries data available for sample dataset url = 'https://developers.google.com/public-data/docs/canonical/countries_csv' # Create the model sdm = SQLDataModel.from_html(url) # Get head of model sdm_head = sdm.head() # View it print(sdm_head)
This will grab the top 5 rows by default:
┌───┬─────────┬──────────┬───────────┬────────────────┐ │ │ country │ latitude │ longitude │ name │ ├───┼─────────┼──────────┼───────────┼────────────────┤ │ 0 │ AF │ 33.9391 │ 67.7100 │ Afghanistan │ │ 1 │ AL │ 41.1533 │ 20.1683 │ Albania │ │ 2 │ DZ │ 28.0339 │ 1.6596 │ Algeria │ │ 3 │ AS │ -14.2710 │ -170.1322 │ American Samoa │ │ 4 │ AD │ 42.5462 │ 1.6016 │ Andorra │ └───┴─────────┴──────────┴───────────┴────────────────┘ [5 rows x 4 columns]
Note
See related
SQLDataModel.tail()
for the opposite, grabbing the bottomn_rows
from the current model.
- header_master
Maps the current model’s column metadata in the format of
'column_name': ('sql_dtype', 'py_dtype', is_regular_column, 'default_alignment')
, updated bySQLDataModel._update_model_metadata()
.- Type:
dict[str, tuple]
- headers
The current column names of the model. If not provided, default column names will be used.
- Type:
list[str]
- hstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel
Horizontally stacks one or more
SQLDataModel
objects to the current model.- Parameters:
other (SQLDataModel or sequence of) – The SQLDataModel objects to horizontally stack.
inplace (bool, optional) – If True, performs the horizontal stacking in-place, modifying the current model. Defaults to False, returning a new
SQLDataModel
.
- Returns:
The horizontally stacked SQLDataModel instance when inplace is False.
- Return type:
SQLDataModel
- Raises:
ValueError – If no additional SQLDataModels are provided for horizontal stacking.
TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.
SQLProgrammingError – If an error occurs when updating the model values in place.
Example:
from SQLDataModel import SQLDataModel # Create models A and B sdm_a = SQLDataModel([('A', 'B'), ('1', '2')], headers=['A1', 'A2']) sdm_b = SQLDataModel([('C', 'D'), ('3', '4')], headers=['B1', 'B2']) # Horizontally stack B onto A sdm_ab = sdm_a.hstack(sdm_b) # View stacked model print(sdm_ab)
This will output the result of stacking B onto A, using each model’s headers and dtypes:
┌─────┬─────┬─────┬─────┐ │ A1 │ A2 │ B1 │ B2 │ ├─────┼─────┼─────┼─────┤ │ A │ B │ C │ D │ │ 1 │ 2 │ 3 │ 4 │ └─────┴─────┴─────┴─────┘ [2 rows x 4 columns]
Multiple models can be stacked simultaneously, here we stack a total of 3 models:
# Create a third model C sdm_c = SQLDataModel([('E', 'F'), ('5', '6')], headers=['C1', 'C2']) # Horizontally stack three models sdm_abc = sdm_a.hstack([sdm_b, sdm_c]) # View stacked result print(sdm_abc)
This will output the result of stacking C and B onto A:
┌─────┬─────┬─────┬─────┬─────┬─────┐ │ A1 │ A2 │ B1 │ B2 │ C1 │ C2 │ ├─────┼─────┼─────┼─────┼─────┼─────┤ │ A │ B │ C │ D │ E │ F │ │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ └─────┴─────┴─────┴─────┴─────┴─────┘ [2 rows x 6 columns]
Note
Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use
SQLDataModel.merge()
for strict SQL joins instead of hstack.Headers and data types are inherited from all the models being stacked, this requires aliasing duplicate column names if present, see
SQLDataModel.alias_duplicates()
for aliasing rules.Use
setitem
syntax such assdm['New Column'] = values
to create new columns directly into the current model instead of stacking or seeSQLDataModel.add_column_with_values()
for convenience method accomplishing the same.See
SQLDataModel.vstack()
for vertical stacking.
- indicies
The current valid row indicies of the model.
- Type:
tuple
- infer_dtypes(n_samples: int = 16, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') None
Infer and set data types for columns based on a random subset of
n_samples
from the current model. Thedateutil
library is required for complex date and datetime parsing, if the module is not found thendate_format
anddatetime_format
will be used for dates and datetimes respectively.- Parameters:
n_samples (int) – The number of random samples to use for data type inference. Default set to 16.
date_format (str) – The format string to use for parsing date values if
dateutil
library is not found. Default is ‘%Y-%m-%d’.datetime_format (str) – The format string to use for parsing datetime values if
dateutil
library is not found. Default is ‘%Y-%m-%d %H:%M:%S’.
- Raises:
TypeError – If argument for
n_samples
is not of typeint
or if argument fordate_format
ordatetime_format
is not of type ‘str’.ValueError – If the current model contains zero columns from which to infer types from.
DimensionError – If the current model contains insufficient rows to sample from.
- Returns:
Inferred column types are updated and
None
is returned.- Return type:
None
Example:
from SQLDataModel import SQLDataModel # Sample data of ``str`` containing probable datatypes headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', '27', '1.22', '2023-02-01'), ('Sarah', 'West', '39', '0.7', '2023-10-01'), ('Mike', 'Harlin', '36', '3.9', '2020-08-27'), ('Pat', 'Douglas', '42', '11.5', '2015-11-06'), ('Kelly', 'Lee', '32', '8.0', '2016-09-18') ] # Create the model sdm = SQLDataModel(data, headers) # Get current column dtypes for reference dtypes_before = sdm.get_column_dtypes() # Infer and set data types based on 10 random samples sdm.infer_dtypes(n_samples=10) # View updated model print(sdm)
This will output data with dtypes correctly aligned:
┌───────┬─────────┬──────┬─────────┬────────────┐ │ first │ last │ age │ service │ hire_date │ ├───────┼─────────┼──────┼─────────┼────────────┤ │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ └───────┴─────────┴──────┴─────────┴────────────┘ [5 rows x 5 columns]
# Get new column types to confirm dtypes_after = sdm.get_column_dtypes() # View updated dtypes for col in sdm.headers: print(f"{col:<10} {dtypes_before[col]} -> {dtypes_after[col]}")
This will output:
first: str -> str last: str -> str age: str -> int service: str -> float hire_date: str -> date
- Related:
See
SQLDataModel.infer_str_type()
for type determination process.See
SQLDataModel.infer_types_from_data()
for type voting scheme used for inference.
Note
If a single
str
instance is found in the samples, the corresponding column dtype will remain asstr
to avoid data loss.Co-occurences of
int
&float
, ordate
&datetime
will favor the superset dtype afterinfer_threshold
is met, sofloat
anddatetime
respectively.If a single
datetime
instance is found amongst a higher proportion ofdate
dtypes,datetime
will be used according to second rule.If a single
float
instance is found amongst a higher proportion ofint
dtypes,float
will be used according to second rule.Ties between dtypes are broken according to current type <
str
<float
<int
<datetime
<date
<bytes
<None
This method calls the
set_column_dtypes()
method once the column dtypes have been inferred if they differ from the current dtype.
- static infer_str_type(obj: str, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') str
Infer the data type of the input object.
- Parameters:
obj (str) – The object for which the data type is to be inferred.
date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.
datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.
- Returns:
The inferred data type.
- Return type:
str
- Inference:
'str'
: If the input object is a string, or cannot be parsed as another data type.'date'
: If the input object represents a date without time information.'datetime'
: If the input object represents a datetime with both date and time information.'int'
: If the input object represents an integer.'float'
: If the input object represents a floating-point number.'bool'
: If the input object represents a boolean value.'bytes'
: If the input object represents a binary array.'None'
: If the input object is None, empty, or not a string.
Note
This method attempts to infer the data type of the input object by evaluating its content.
If the input object is a string, it is parsed to determine whether it represents a date, datetime, integer, or float.
If the input object is not a string or cannot be parsed, its type is determined based on its Python type (bool, int, float, bytes, or None).
- static infer_types_from_data(input_data: list[list], date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') list[str]
Infer the best types of
input_data
by using a simple presence-based voting scheme. Sampling is assumed prior to function call, treatinginput_data
as already a sampled subset from the original data.- Parameters:
input_data (list[list]) – A list of lists containing the input data.
date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.
datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.
- Returns:
A list representing the best-matching inferred types for each column based on the sampled data.
- Return type:
list
Note
If multiple types are present in the samples, the most appropriate type is inferred based on certain rules.
If a column contains both
date
anddatetime
instances, the type is inferred asdatetime
.If a column contains both
int
andfloat
instances, the type is inferred asfloat
.If a column contains only
str
instances or multiple types with no clear choice, the type remains asstr
.
- Related:
See
SQLDataModel.infer_str_type()
for type determination process.
- insert_row(index: int, values: list | tuple, on_conflict: Literal['replace', 'ignore'] = 'replace') None
Inserts a new row into the
SQLDataModel
at the specifiedindex
with the providedvalues
.- Parameters:
index (int) – The position at which to insert the row.
values (list or tuple) – The values to be inserted into the row.
on_conflict (Literal['replace', 'ignore'], optional) – Specifies the action to take if the index already exists. Default is ‘replace’.
- Raises:
TypeError – If
index
is not an integer orvalues
is not a list or tuple.ValueError – If
on_conflict
is not'replace'
or'ignore'
.DimensionError – If the dimensions of the provided
values
are incompatible with the current model dimensions.SQLProgrammingError – If there is an issue with the SQL execution during the insertion.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Sample data data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')] # Create the model sdm = SQLDataModel(data, headers=['Name','Age','Sex']) # Insert a new row at index 3 sdm.insert_row(3, ['David', 35, 'M']) # Insert or replace row at index 1 sdm.insert_row(1, ['Beth', 27, 'F'], on_conflict='replace') # View result print(sdm)
This will output the modified model:
┌───┬───────┬─────┬─────┐ │ │ Name │ Age │ Sex │ ├───┼───────┼─────┼─────┤ │ 0 │ Alice │ 20 │ F │ │ 1 │ Beth │ 27 │ F │ │ 2 │ Chris │ 30 │ M │ │ 3 │ David │ 35 │ M │ └───┴───────┴─────┴─────┘ [4 rows x 3 columns]
- Change Log:
- Version 0.6.0 (2024-05-14):
Backward incompatible changes made to arguments and behavior, added
index
andon_conflict
parameters for greater specificity and to align with broader conventions surrounding insert methods.
Note
Use
on_conflict = 'ignore'
to take no action if row already exists, andon_conflict = 'replace'
to replace it.See
SQLDataModel.append_row()
for appending rows at the next available index instead of insertion at index.
- iter_rows(min_row: int = None, max_row: int = None, index: bool = True, include_headers: bool = False) Generator
Returns a generator object of the rows in the model from
min_row
to max_row.- Parameters:
min_row (int, optional) – The minimum row index to start iterating from (inclusive). Defaults to None.
max_row (int, optional) – The maximum row index to iterate up to (exclusive). Defaults to None.
index (bool, optional) – Whether to include the row index in the output. Defaults to True.
include_headers (bool, optional) – Whether to include headers as the first row. Defaults to False.
- Yields:
Generator
– Rows from the specified range, including headers if specified.
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Iterate over the rows for row in sdm.iter_rows(min_row=2, max_row=4): pass # Do stuff
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
- iter_tuples(include_idx_col: bool = False) Generator
Returns a generator object of the
SQLDataModel
as namedtuples using current headers as field names.- Parameters:
include_idx_col (bool, optional) – Whether to include the index column in the namedtuples. Defaults to False.
- Raises:
ValueError – Raised if headers are not valid Python identifiers. Use
SQLDataModel.normalize_headers()
method to fix.- Yields:
Generator
– Namedtuples representing rows with field names based on current headers.
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Iterate over the namedtuples for row_tuple in sdm.iter_tuples(include_idx_col=True): pass # Do stuff with namedtuples
- max() SQLDataModel
Returns a new
SQLDataModel
containing the maximum value of all non-null values for each column in a row-wise orientation.- Returns:
A new SQLDataModel containing the maximum non-null value for each column.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data with missing values headers = ['Name', 'Age', 'Gender', 'Tenure'] data = [ ('Alice', 25, 'Female', 1.0), ('Bob', None, 'Male', 2.7), ('Charlie', 30, 'Male', None), ('David', None, 'Male', 3.8) ] # Create the model sdm = SQLDataModel(data, headers) # Get maximum values min_values = sdm.min() # View result print(min_values)
This will output the maximum value of all non-null values for each column:
┌───────┬─────┬────────┬────────┐ │ Name │ Age │ Gender │ Tenure │ ├───────┼─────┼────────┼────────┤ │ David │ 30 │ Male │ 3.80 │ └───────┴─────┴────────┴────────┘ [1 rows x 4 columns]
Note
See
SQLDataModel.count_unique()
for column-wise count of unique, null and total values for each column.See
SQLDataModel.min()
for returning the minimum values in each column.
- max_column_width
The maximum column width in characters to use for string representations of the data. Default is 38.
- Type:
int
- mean() SQLDataModel
Returns a new
SQLDataModel
containing the mean value of all viable columns in the current model. Calculated bysum(x_i, ..., x_n) * (1 / N)
- Returns:
A new SQLDataModel containing the mean values of each column.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Birthday', 'Height', 'Date of Hire'] data = [ ('John', 30, '1994-06-15', 175.3, '2018-03-03 11:20:19'), ('Alice', 28, '1996-11-20', 162.0, '2023-04-24 08:45:30'), ('Travis', 37, '1987-01-07', 185.8, '2012-10-06 15:30:40') ] # Create the model and infer correct types sdm = SQLDataModel(data, headers, infer_dtypes=True) # View full model print(sdm)
This will output the sample model we’ll be using to calculate mean values for:
┌────────┬─────┬────────────┬─────────┬─────────────────────┐ │ Name │ Age │ Birthday │ Height │ Date of Hire │ ├────────┼─────┼────────────┼─────────┼─────────────────────┤ │ John │ 30 │ 1994-06-15 │ 175.30 │ 2018-03-03 11:20:19 │ │ Alice │ 28 │ 1996-11-20 │ 162.00 │ 2023-04-24 08:45:30 │ │ Travis │ 37 │ 1987-01-07 │ 185.80 │ 2012-10-06 15:30:40 │ └────────┴─────┴────────────┴─────────┴─────────────────────┘ [3 rows x 5 columns]
Now let’s find the mean values:
# Calculate the mean values sdm_mean = sdm.mean() # View result print(sdm_mean)
This will output the mean values for the “Age”, “Birthday”, “Height” and “Date of Hire” columns:
┌──────┬────────┬────────────┬─────────┬─────────────────────┐ │ Name │ Age │ Birthday │ Height │ Date of Hire │ ├──────┼────────┼────────────┼─────────┼─────────────────────┤ │ NaN │ 31.67 │ 1992-10-14 │ 174.37 │ 2018-01-30 11:52:09 │ └──────┴────────┴────────────┴─────────┴─────────────────────┘ [1 rows x 5 columns]
Note
Only non-null values are included in the calculation of the sum and the total number of values in the column, use
SQLDataModel.fillna()
to fill null values.For
date
anddatetime
columns values are converted to julian days prior to calculation and recast into original data type, some imprecision may occur as a result.See
SQLDataModel.min()
for returning the minimum value,SQLDataModel.max()
for maximum value, andSQLDataModel.describe()
for descriptive statical values.
- merge(merge_with: SQLDataModel = None, how: Literal['left', 'right', 'inner', 'full outer', 'cross'] = 'left', left_on: str = None, right_on: str = None, include_join_column: bool = False) SQLDataModel
Merges two
SQLDataModel
instances based on specified columns and merge type,how
, returning the result as a new instance. If the join column shares the same name in both models,left_on
andright_on
column arguments are not required and will be inferred. Otherwise, explicit arguments for both are required.- Parameters:
merge_with (SQLDataModel) – The SQLDataModel to merge with the current model.
how (Literal["left", "right", "inner", "full outer", "cross"]) – The type of merge to perform.
left_on (str) – The column name from the current model to use as the left join key.
right_on (str) – The column name from the
merge_with
model to use as the right join key.include_join_column (bool) – If the shared column being used as the join key should be included from both tables. Default is False.
- Raises:
TypeError – If
merge_with
is not of typeSQLDataModel
.DimensionError – If no shared column exists, and explicit
left_on
andright_on
arguments are not provided.ValueError – If the specified
left_on
orright_on
column is not found in the respective models.
- Returns:
A new SQLDataModel containing the product of the merged result.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Left table data with ID column left_headers = ["Name", "Age", "ID"] left_data = [ ["Bob", 35, 1], ["Alice", 30, 5], ["David", 40, None], ["Charlie", 25, 2] ] # Right table data with shared ID column right_headers = ["ID", "Country"] right_data = [ [1, "USA"], [2, "Germany"], [3, "France"], [4, "Latvia"] ] # Create the left and right tables sdm_left = SQLDataModel(left_data, left_headers) sdm_right = SQLDataModel(right_data, right_headers)
Here are the left and right tables we will be joining:
Left Table: Right Table: ┌─────────┬──────┬──────┐ ┌──────┬─────────┐ │ Name │ Age │ ID │ │ ID │ Country │ ├─────────┼──────┼──────┤ ├──────┼─────────┤ │ Bob │ 35 │ 1 │ │ 1 │ USA │ │ Alice │ 30 │ 5 │ │ 2 │ Germany │ │ David │ 40 │ │ │ 3 │ France │ │ Charlie │ 25 │ 2 │ │ 4 │ Latvia │ └─────────┴──────┴──────┘ └──────┴─────────┘ [4 rows x 3 columns] [4 rows x 2 columns]
Left Join
# Create a model by performing a left join with the tables sdm_joined = sdm_left.merge(sdm_right, how="left") # View result print(sdm_joined)
This will output:
Left Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Alice │ 30 │ 5 │ │ │ David │ 40 │ │ │ │ Charlie │ 25 │ 2 │ Germany │ └─────────┴──────┴──────┴─────────┘ [4 rows x 4 columns]
Right Join
# Create a model by performing a right join with the tables sdm_joined = sdm_left.merge(sdm_right, how="right") # View result print(sdm_joined)
This will output:
Right Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Charlie │ 25 │ 2 │ Germany │ │ │ │ │ France │ │ │ │ │ Latvia │ └─────────┴──────┴──────┴─────────┘ [4 rows x 4 columns]
Inner Join
# Create a model by performing an inner join with the tables sdm_joined = sdm_left.merge(sdm_right, how="inner") # View result print(sdm_joined)
This will output:
Inner Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Charlie │ 25 │ 2 │ Germany │ └─────────┴──────┴──────┴─────────┘ [2 rows x 4 columns]
Full Outer Join
# Create a model by performing a full outer join with the tables sdm_joined = sdm_left.merge(sdm_right, how="full outer") # View result print(sdm_joined)
This will output:
Full Outer Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Alice │ 30 │ 5 │ │ │ David │ 40 │ │ │ │ Charlie │ 25 │ 2 │ Germany │ │ │ │ │ France │ │ │ │ │ Latvia │ └─────────┴──────┴──────┴─────────┘ [6 rows x 4 columns]
Cross Join
# Create a model by performing a cross join with the tables sdm_joined = sdm_left.merge(sdm_right, how="cross") # View result print(sdm_joined)
This will output:
Cross Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Bob │ 35 │ 1 │ Germany │ │ Bob │ 35 │ 1 │ France │ │ Bob │ 35 │ 1 │ Latvia │ │ Alice │ 30 │ 5 │ USA │ │ Alice │ 30 │ 5 │ Germany │ │ Alice │ 30 │ 5 │ France │ │ Alice │ 30 │ 5 │ Latvia │ │ David │ 40 │ │ USA │ │ David │ 40 │ │ Germany │ │ David │ 40 │ │ France │ │ David │ 40 │ │ Latvia │ │ Charlie │ 25 │ 2 │ USA │ │ Charlie │ 25 │ 2 │ Germany │ │ Charlie │ 25 │ 2 │ France │ │ Charlie │ 25 │ 2 │ Latvia │ └─────────┴──────┴──────┴─────────┘ [16 rows x 4 columns]
Note
If
include_join_column=False
then only theleft_on
join column is included in the result, with theright_on
column removed to avoid redundant shared key values.If
include_join_column=True
then all the columns from both models are included in the result, with aliasing to avoid naming conflicts, seeSQLDataModel.alias_duplicates()
for details.The resulting
SQLDataModel
is created based on thesqlite3
join definition and specified columns and merge type, for details seesqlite3
documentation.
- min() SQLDataModel
Returns a new
SQLDataModel
containing the minimum value of all non-null values for each column in a row-wise orientation.- Returns:
A new SQLDataModel containing the minimum non-null value for each column.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data with missing values headers = ['Name', 'Age', 'Gender', 'Tenure'] data = [ ('Alice', 25, 'Female', 1.0), ('Bob', None, 'Male', 2.7), ('Charlie', 30, 'Male', None), ('David', None, 'Male', 3.8) ] # Create the model sdm = SQLDataModel(data, headers) # Get minimum values min_values = sdm.min() # View result print(min_values)
This will output the minimum value of all non-null values for each column:
┌───────┬─────┬────────┬────────┐ │ Name │ Age │ Gender │ Tenure │ ├───────┼─────┼────────┼────────┤ │ Alice │ 25 │ Female │ 1.00 │ └───────┴─────┴────────┴────────┘ [1 rows x 4 columns]
Note
See
SQLDataModel.count_unique()
for column-wise count of unique, null and total values for each column.See
SQLDataModel.max()
for returning the maximum values in each column.
- min_column_width
The minimum column width in characters to use for string representations of the data. Default is 3.
- Type:
int
- normalize_headers(apply_function: Callable = None) None
Reformats the current
SQLDataModel
headers into an uncased normalized form using alphanumeric characters only. WrapsSQLDataModel.set_headers()
.- Parameters:
apply_function (Callable, optional) – Specify an alternative normalization pattern. When
None
, the pattern'[^0-9a-z _]+'
will be used on uncased values.- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create model sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Use default normalization scheme, uncased and strips invalid SQL identifiers sdm.normalize_headers() # Get renamed headers after default normalization sdm.get_headers() # now outputs ['first_name', 'last_name', 'salary'] # Or use custom renaming scheme sdm.normalize_headers(lambda x: x.upper()) # Get renamed headers again sdm.get_headers() # now outputs ['FIRST_NAME', 'LAST_NAME', 'SALARY']
- rename_column(column: int | str, new_column_name: str) None
Renames a column in the
SQLDataModel
at the specified index or using the old column name with the provided value innew_column_name
.- Parameters:
column (int|str) – The index or current str value of the column to be renamed.
new_column_name (str) – The new name as a str value for the specified column.
- Raises:
TypeError – If the
column
ornew_column_name
parameters are invalid types.IndexError – If the provided column index is outside the current column range.
SQLProgrammingError – If there is an issue with the SQL execution during the column renaming.
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the model with sample data sdm = SQLDataModel(data,headers) # Example: Rename the column at index 1 to 'first_name' sdm.rename_column(1, 'first_name') # Get current values new_headers = sdm.get_headers() # Outputs ['first_name', 'last', 'age'] print(new_headers)
Note
The method allows renaming a column identified by its index in the SQLDataModel.
Handles negative indices by adjusting them relative to the end of the column range.
If an error occurs during SQL execution, it rolls back the changes and raises a SQLProgrammingError with an informative message.
- replace(pattern: str, replacement: str, inplace: bool = False, **kwargs) SQLDataModel
Replaces matching occurrences of a specified pattern with a replacement value in the
SQLDataModel
instance. If inplace is True, the method updates the existing SQLDataModel; otherwise, it returns a newSQLDataModel
with the replacements applied.- Parameters:
pattern (str) – The substring or regular expression pattern to search for in each column.
replacement (str) – The string to replace the matched pattern with.
inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.
**kwargs – Additional keyword arguments to be passed to the
execute_fetch
method when not in-place.
- Raises:
TypeError – If the
pattern
orreplacement
parameters are invalid types.- Returns:
If
inplace=True
, modifies the current instance in-place and returnsNone
. Otherwise, returns a new SQLDataModel with the specified replacements applied.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service'] data = [ ('John', 'Smith', 27, 1.22), ('Sarah', 'West', 39, 0.7), ('Mike', 'Harlin', 36, 3), ('Pat', 'Douglas', 42, 11.5) ] # Create the model sdm = SQLDataModel(data, headers,display_float_precision=2, display_index=False) # Replace 'John' in the 'first' column sdm['first'] = sdm['first'].replace("John","Jane") # View model print(sdm)
This will output:
┌───────┬─────────┬──────┬─────────┐ │ first │ last │ age │ service │ ├───────┼─────────┼──────┼─────────┤ │ Jane │ Smith │ 27 │ 1.22 │ │ Sarah │ West │ 39 │ 0.70 │ │ Mike │ Harlin │ 36 │ 3.00 │ │ Pat │ Douglas │ 42 │ 11.50 │ └───────┴─────────┴──────┴─────────┘ [4 rows x 4 columns]
- reset_index(start_index: int = 0) None
Resets the index of the
SQLDataModel
instance inplace to zero-based sequential autoincrement, or to specifiedstart_index
base with sequential incrementation.- Parameters:
start_index (int, optional) – The starting index for the reset operation. Defaults to 0.
- Raises:
TypeError – If provided
start_index
argument is not of typeint
ValueError – If the specified
start_index
is greater than the minimum index in the current model.SQLProgrammingError – If reset index execution results in constraint violation or programming error.
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age', 'service'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model sdm = SQLDataModel(data, headers) # View current state print(sdm)
This will output:
┌─────┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├─────┼────────┼─────────┼────────┼─────────┤ │ 994 │ john │ smith │ 27 │ 1.22 │ │ 995 │ sarah │ west │ 39 │ 0.70 │ │ 996 │ mike │ harlin │ 36 │ 3.00 │ │ 997 │ pat │ douglas │ 42 │ 11.50 │ └─────┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
Now reset the index column:
from SQLDataModel import SQLDataModel # Reset the index with default start value sdm.reset_index() # View updated model print(sdm)
This will output:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 27 │ 1.22 │ │ 1 │ sarah │ west │ 39 │ 0.70 │ │ 2 │ mike │ harlin │ 36 │ 3.00 │ │ 3 │ pat │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
Reset the index to a custom value:
from SQLDataModel import SQLDataModel # Reset the index with a different value sdm.reset_index(start_index = -3) # View updated model print(sdm)
This will output:
┌────┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├────┼────────┼─────────┼────────┼─────────┤ │ -3 │ john │ smith │ 27 │ 1.22 │ │ -2 │ sarah │ west │ 39 │ 0.70 │ │ -1 │ mike │ harlin │ 36 │ 3.00 │ │ 0 │ pat │ douglas │ 42 │ 11.50 │ └────┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
Note
The current index should be viewed more as a soft row number, to assign hard indicies use
SQLDataModel.freeze_index()
method.Setting
start_index
too a very large negative or positive integer made lead to unpredictable behavior.
- row_count
The current row count of the model.
- Type:
int
- sample(n_samples: float | int = 0.05, **kwargs) SQLDataModel
Return a random sample of size
n_samples
as a newSQLDataModel
.- Parameters:
n_samples (float | int) – Number of rows or proportion of rows to sample. Default set to
0.05
, proportional to 5% of the currentSQLDataModel.row_count
. Ifn_samples
is an integer, it represents the exact number of rows to sample where0 < n_samples <= row_count
. Ifn_samples
is a float, it represents the proportion of rows to sample where0.0 < n_samples <= 1.0
.- Returns:
A new SQLDataModel instance containing the sampled rows.
- Return type:
SQLDataModel
- Raises:
TypeError – If the
n_samples
parameter is not of type ‘int’ or ‘float’.ValueError – If the
n_samples
value is invalid or out of range.
This method generates a random sample of rows from the current SQLDataModel. The number of rows to sample can be specified either as an integer representing the exact number of rows or as a float representing the proportion of rows to sample. The sampled rows are returned as a new SQLDataModel instance.
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Amount']) # Example 1: Sample 10 random rows sample_result = sdm.sample(n_samples=10) # Create the model sdm2 = SQLDataModel.from_csv('another_example.csv', headers=['Code', 'Description', 'Price']) # Example 2: Sample 20% of rows sample_result2 = sdm2.sample(n_samples=0.2)
Note
If the current model’s
SQLDataModel.row_count
value is less than the sample size, the current row count will be used instead.
- set_column_alignment(alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic') None
Sets the default alignment behavior for
SQLDataModel
whenrepr
orprint
is called, modifiescolumn_alignment
attribute. Default behavior set to'dynamic'
, which right-aligns numeric data types, left-aligns all other types, with headers matching value alignment.- Parameters:
alignment (str) – The column alignment setting to use.
'dynamic'
: Default behavior, dynamically aligns columns based on column data types.'left'
: Left-align all column values.'center'
: Center-align all column values.'right'
: Right-align all column values.- Raises:
TypeError – If the argument for alignment is not of type ‘str’.
ValueError – If the provided alignment is not one of ‘dynamic’, ‘left’, ‘center’, ‘right’.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Set to right-align columns sdm.set_column_alignment('right') # Output print(sdm)
This will output the model with values right-aligned:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 27 │ 1.22 │ │ 1 │ sarah │ west │ 39 │ 0.70 │ │ 2 │ mike │ harlin │ 36 │ 3.00 │ │ 3 │ pat │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘
Setting columns to be left-aligned:
# Set to left-align sdm.set_column_alignment('left') # Output print(sdm)
This will output the model with left-aligned values instead:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 27 │ 1.22 │ │ 1 │ sarah │ west │ 39 │ 0.70 │ │ 2 │ mike │ harlin │ 36 │ 3.00 │ │ 3 │ pat │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘
Note
Use
SQLDataModel.get_column_alignment()
to return the current column alignment setting.When using ‘center’, if the column contents cannot be perfectly centralized, the left side will be favored.
Use ‘dynamic’ to return to default column alignment, which is right-aligned for numeric types and left-aligned for others.
See
SQLDataModel.set_table_style()
for modifying table format and available styles.
- set_column_dtypes(column: str | int, dtype: Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) None
Casts the specified
column
into the provided pythondtype
. The datatype must be a valid convertable python datatype to map to an equivalent SQL datatype.- Parameters:
column (str or int) – The name or index of the column to be cast, must be current header or within range of current
column_count
dtype (Literal['bool', 'bytes', 'datetime', 'float', 'int', 'None', 'str']) – The target python data type for the specified column.
- Raises:
TypeError – If
column
is not of type ‘str’ or ‘int’.IndexError – If
column
is an integer and the index is outside of the current model range.ValueError – If
column
is a string and the column is not found in the current model.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the SQLDataModel object sdm = SQLDataModel(data, headers) # Original dtype for comparison old_dtype = sdm.get_column_dtypes('age') # Set the data type of the 'age' column to 'float' sdm.set_column_dtypes('age', 'float') # Confirm column dtype new_dtype = sdm.get_column_dtypes('age') # View result print(f"Age dtype: {old_dtype} -> {new_dtype}")
This will output:
Age dtype: int -> float
Warning
Type casting will coerce any nonconforming values to the
dtype
being set, this means data will be lost if casted incorrectly.
- set_display_color(color: str | tuple)
Sets the table string representation color when
SQLDataModel
is displayed in the terminal.- Parameters:
color (str or tuple) – Color to set. Accepts hex value (e.g.,
'#A6D7E8'
) or tuple of RGB values (e.g.,(166, 215, 232)
).- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['Name', 'Age', 'Salary']) # Set color with hex value sdm.set_display_color('#A6D7E8')
Note
By default, no color styling is applied and the native terminal color is used.
To use rgb values, ensure a single tuple is provided as an argument.
- set_display_float_precision(float_precision: int) None
Sets the current float display precision to the specified value for use in the
repr
method of theSQLDataModel
when representing float data types. Note that this precision limit is overridden by themax_column_width
value if the precision limit exceeds the specified maximum width.- Parameters:
float_precision (int) – The desired float display precision to be used for real number values.
- Raises:
TypeError – If the
float_precision
argument is not of type ‘int’.ValueError – If the
float_precision
argument is a negative value, as it must be a valid f-string precision identifier.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age', 'service_time'] data = [ (0, 'john', 'smith', 27, 1.22) ,(1, 'sarah', 'west', 0.7) ,(2, 'mike', 'harlin', 3) ,(3, 'pat', 'douglas', 11.5) ] # Create the model with sample data sdm = SQLDataModel(data,headers) # Example: Set the float display precision to 2 sdm.set_display_float_precision(2) # View model print(sdm)
This will output:
┌───┬────────┬─────────┬────────┬──────────────┐ │ │ first │ last │ age │ service_time │ ├───┼────────┼─────────┼────────┼──────────────┤ │ 0 │ john │ smith │ 27 │ 2.10 │ │ 1 │ sarah │ west │ 29 │ 0.80 │ │ 2 │ mike │ harlin │ 36 │ 1.30 │ │ 3 │ pat │ douglas │ 42 │ 7.02 │ └───┴────────┴─────────┴────────┴──────────────┘
# Get the updated float display precision updated_precision = sdm.get_display_float_precision() # Outputs 2 print(updated_precision)
Note
The
display_float_precision
attribute only affects the precision for displaying real or floating point values.The actual precision of the stored value in the model is unaffected by the value set.
- set_display_index(display_index: bool) None
Sets the value for
SQLDataModel.display_index
to enable or disable the inclusion of theSQLDataModel
index value in print or repr calls.- Parameters:
display_index (bool) – Whether or not to include the index in
SQLDataModel
representations.- Raises:
TypeError – If the provided argument is not a boolean value.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Disable displaying index sdm.set_display_index(False)
Note
Use
SQLDataModel.set_table_style()
to more broadly modify the appearance and formatting style ofSQLDataModel
string representations.
- set_display_max_rows(rows: int | None) None
Sets value at
SQLDataModel.display_max_rows
to limit maximum rows displayed whenrepr
orprint
is called. Userows = None
to derive max number to display from the current terminal height.- Parameters:
rows (int) – The maximum number of rows to display.
- Raises:
TypeError – If the provided argument is not
None
or is not an integer.IndexError – If the provided value is an integer less than or equal to 0.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Any call to `print` or `repr` will be restricted to 500 max rows sdm.set_display_max_rows(500) # Alternatively, auto-detect dimensions by setting to `None` sdm.set_display_max_rows(None)
Note
Modifying
SQLDataModel.display_max_rows
does not affect the actual number of rows in the model, only the maximum rows displayed.
- set_headers(new_headers: list[str]) None
Renames the current
SQLDataModel
headers to values provided innew_headers
. Headers must have the same dimensions and match existing headers.- Parameters:
new_headers (list) – A list of new header names. It must have the same dimensions as the existing headers.
- Raises:
TypeError – If the
new_headers
type is not a valid type (list or tuple).DimensionError – If the length of
new_headers
does not match the column count.TypeError – If the type of the first element in
new_headers
is not a valid type (str, int, or float).
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create model sdm = SQLDataModel.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Set new headers sdm.set_headers(['First_Name', 'Last_Name', 'Payment'])
- set_max_column_width(width: int) None
Set
max_column_width
as the maximum number of characters per column whenrepr
orprint
is called.- Parameters:
width (int) – The maximum width for each column.
- Returns:
Sets the
max_column_width
property.- Return type:
None
Note
If
max_column_width
is set to a value below the currentmin_column_width
property, the maximum width will override the minimum width.The minimum required width is
2
, whenmax_column_width < 2
,2
will be used regardless of thewidth
provided.
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Change the max column width for the table representation sdm.set_max_column_width(20)
- set_min_column_width(width: int) None
Set
min_column_width
as the minimum number of characters per column whenrepr
orprint
is called.- Parameters:
width (int) – The minimum width for each column.
- Returns:
Sets the
min_column_width
property.- Return type:
None
Note
If
min_column_width
is set to a value below the currentmax_column_width
property, the maximum width will override the minimum width.The minimum required width is
2
, whenmin_column_width < 2
,2
will be used regardless of thewidth
provided.
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Set a new minimum column width value sdm.set_min_column_width(8) # Check updated value print(sdm.get_min_column_width) # 8
- set_model_name(new_name: str) None
Sets the new
SQLDataModel
table name that will be used as an alias for any SQL queries executed by the user or internally.- Parameters:
new_name (str) – The new table name for the
SQLDataModel
.- Raises:
SQLProgrammingError – If unable to rename the model table due to SQL execution failure.
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel.from_csv('example.csv', headers=['Column1', 'Column2']) # Rename the model sdm.set_model_name('custom_table')
Note
The provided value must be a valid SQL table name.
This alias will be reset to the default value for any new
SQLDataModel
instances:'sdm'
.
- set_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = 'default') None
Sets the table style used for string representations of
SQLDataModel
.- Parameters:
style (Literal['ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql','round']) – The table styling to set, use
'default'
for original style.- Raises:
ValueError – If
style
provided is not one of the currently supported options ‘ascii’,’bare’,’dash’,’default’,’double’,’list’,’markdown’,’outline’,’pandas’,’polars’,’postgresql’ or ‘round’.- Returns:
None
Examples:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height', 'Birthday'] data = [ ('Alice', 28, 162.08, '1996-11-20'), ('Bobby', 30, 175.36, '1994-06-15'), ('Craig', 37, 185.82, '1987-01-07'), ('David', 32, 179.75, '1992-12-28') ] # Create the model sdm = SQLDataModel(data, headers) # Lets try the round style sdm.set_table_style('round') # View it print(sdm)
This outputs the
'round'
table style:╭───────┬─────┬─────────┬────────────╮ │ Name │ Age │ Height │ Birthday │ ├───────┼─────┼─────────┼────────────┤ │ Alice │ 28 │ 162.08 │ 1996-11-20 │ │ Bobby │ 30 │ 175.36 │ 1994-06-15 │ │ Craig │ 37 │ 185.82 │ 1987-01-07 │ │ David │ 32 │ 179.75 │ 1992-12-28 │ ╰───────┴─────┴─────────┴────────────╯
Alternatively, set
style = 'ascii'
to formatSQLDataModel
in the ASCII style, the OG of terminal tables:+-------+-----+---------+------------+ | Name | Age | Height | Birthday | +-------+-----+---------+------------+ | Alice | 28 | 162.08 | 1996-11-20 | | Bobby | 30 | 175.36 | 1994-06-15 | | Craig | 37 | 185.82 | 1987-01-07 | | David | 32 | 179.75 | 1992-12-28 | +-------+-----+---------+------------+
Set
style = 'bare'
to formatSQLDataModel
in the following style:Name Age Height Birthday ------------------------------- Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28
Set
style = 'dash'
to formatSQLDataModel
with dashes for internal borders:┌───────┬─────┬─────────┬────────────┐ │ Name ╎ Age ╎ Height ╎ Birthday │ ├╴╴╴╴╴╴╴┼╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴╴╴╴┤ │ Alice ╎ 28 ╎ 162.08 ╎ 1996-11-20 │ │ Bobby ╎ 30 ╎ 175.36 ╎ 1994-06-15 │ │ Craig ╎ 37 ╎ 185.82 ╎ 1987-01-07 │ │ David ╎ 32 ╎ 179.75 ╎ 1992-12-28 │ └───────┴─────┴─────────┴────────────┘
Set
style = 'default'
to formatSQLDataModel
in the following style, which also happens to be the default styling applied:┌───────┬─────┬─────────┬────────────┐ │ Name │ Age │ Height │ Birthday │ ├───────┼─────┼─────────┼────────────┤ │ Alice │ 28 │ 162.08 │ 1996-11-20 │ │ Bobby │ 30 │ 175.36 │ 1994-06-15 │ │ Craig │ 37 │ 185.82 │ 1987-01-07 │ │ David │ 32 │ 179.75 │ 1992-12-28 │ └───────┴─────┴─────────┴────────────┘ [4 rows x 4 columns]
Set
style = 'list'
to formatSQLDataModel
as a list of values, similar to the SQLite CLI representation:Name Age Height Birthday ----- --- ------- ---------- Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28
Set
style = 'double'
to formatSQLDataModel
using double line borders:╔═══════╦═════╦═════════╦════════════╗ ║ Name ║ Age ║ Height ║ Birthday ║ ╠═══════╬═════╬═════════╬════════════╣ ║ Alice ║ 28 ║ 162.08 ║ 1996-11-20 ║ ║ Bobby ║ 30 ║ 175.36 ║ 1994-06-15 ║ ║ Craig ║ 37 ║ 185.82 ║ 1987-01-07 ║ ║ David ║ 32 ║ 179.75 ║ 1992-12-28 ║ ╚═══════╩═════╩═════════╩════════════╝
Set
style = 'markdown'
to formatSQLDataModel
in the Markdown style:| Name | Age | Height | Birthday | |-------|-----|---------|------------| | Alice | 28 | 162.08 | 1996-11-20 | | Bobby | 30 | 175.36 | 1994-06-15 | | Craig | 37 | 185.82 | 1987-01-07 | | David | 32 | 179.75 | 1992-12-28 |
Set
style = 'outline'
to formatSQLDataModel
in the following style:┌─────────────────────────────────┐ │ Name Age Height Birthday │ ├─────────────────────────────────┤ │ Alice 28 162.08 1996-11-20 │ │ Bobby 30 175.36 1994-06-15 │ │ Craig 37 185.82 1987-01-07 │ │ David 32 179.75 1992-12-28 │ └─────────────────────────────────┘
Set
style = 'pandas'
to formatSQLDataModel
in the style used by Pandas DataFrames:Name Age Height Birthday Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28
Set
style = 'polars'
to formatSQLDataModel
in the style used by Polars DataFrames:┌───────┬─────┬─────────┬────────────┐ │ Name ┆ Age ┆ Height ┆ Birthday │ ╞═══════╪═════╪═════════╪════════════╡ │ Alice ┆ 28 ┆ 162.08 ┆ 1996-11-20 │ │ Bobby ┆ 30 ┆ 175.36 ┆ 1994-06-15 │ │ Craig ┆ 37 ┆ 185.82 ┆ 1987-01-07 │ │ David ┆ 32 ┆ 179.75 ┆ 1992-12-28 │ └───────┴─────┴─────────┴────────────┘
Set
style = 'postgresql'
to formatSQLDataModel
in the style used by PostgreSQL:Name | Age | Height | Birthday ------+-----+---------+----------- Alice | 28 | 162.08 | 1996-11-20 Bobby | 30 | 175.36 | 1994-06-15 Craig | 37 | 185.82 | 1987-01-07 David | 32 | 179.75 | 1992-12-28
- Change Log:
- Version 0.3.11 (2024-04-18):
Removed
'thick'
style and added'list'
style for greater variety of available formats.
Note
The labels given to certain styles are entirely subjective and do not in any way express original design or ownership of the styling used.
Legacy character sets on older terminals may not support all the character encodings required for some styles.
- shape
The current dimensions of the model as a tuple of
(rows, columns)
.- Type:
tuple[int, int]
- sort(by: str | int | list = None, asc: bool = True) SQLDataModel
Sort columns in the dataset by the specified ordering. If no value is specified, the current
SQLDataModel.sql_idx
column is used with the default orderingasc = True
.- Parameters:
by (str | int | list, optional) – The column or list of columns by which to sort the dataset. Defaults to sorting by the dataset’s index.
asc (bool, optional) – If True, sort in ascending order; if False, sort in descending order. Defaults to ascending order.
- Raises:
TypeError – If value for
by
argument is not one of type ‘str’, ‘int’ or ‘list’.ValueError – If a specified column in
by
is not found in the current dataset or is an invalid column.IndexError – If columns are indexed by integer but are outside of the current model range.
- Returns:
A new instance of SQLDataModel with columns sorted according to the specified ordering.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01'), ('Sarah', 'West', 39, 0.7, '2023-10-01'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18') ] # Create the model sdm = SQLDataModel(data, headers) # Sort by last name column sorted_sdm = sdm.sort('last') # View sorted model print(sorted_sdm)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┐ │ │ first │ last │ age │ service │ hire_date │ ├───┼───────┼─────────┼──────┼─────────┼────────────┤ │ 0 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ │ 1 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ │ 2 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ │ 3 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ │ 4 │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ └───┴───────┴─────────┴──────┴─────────┴────────────┘ [5 rows x 5 columns]
Sort by multiple columns:
# Sort by multiple columns in descending order sorted_sdm = sdm.sort(['age','hire_date'], asc=False) # View sorted print(sorted_sdm)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┐ │ │ first │ last │ age │ service │ hire_date │ ├───┼───────┼─────────┼──────┼─────────┼────────────┤ │ 0 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ │ 1 │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ │ 3 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ │ 4 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ └───┴───────┴─────────┴──────┴─────────┴────────────┘ [5 rows x 5 columns]
- Change Log:
- Version 0.5.1 (2024-05-10):
Modified to allow integer indexing for column sort order in
by
argument.
Note
Standard sorting process for
sqlite3
is used, whereby the ordering prefers the first column mentioned to the last.Ascending and descending ordering follows this order of operations for multiple columns as well.
- sql_db_conn
The in-memory sqlite3 connection object in use by the model.
- Type:
sqlite3.Connection
- sql_idx
The index column name applied to the sqlite3 in-memory representation of the model. Default is
'idx'
- Type:
str
- sql_model
The table name applied to the sqlite3 in-memory representation of the model. Default is
'sdm'
- Type:
str
- static sqlite_cast_type_format(param: str = '?', dtype: Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool'] = 'str', as_binding: bool = True, as_alias: bool = False)
Formats the specified param to be cast consistently into the python type specified for insert params or as a named alias param.
- Parameters:
param (str) – The parameter to be formatted.
dtype (Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool']) – The python data type of the parameter as a string.
as_binding (bool, optional) – Whether to format as a binding parameter (default is True).
as_alias (bool, optional) – Whether to include an alias for the parameter (default is False).
- Returns:
The parameter formatted for SQL type casting.
- Return type:
str
Note
This function provides consistent formatting for casting parameters into specific data types for SQLite, changing it will lead to unexpected behaviors.
- static sqlite_printf_format(column: str, dtype: str, max_pad_width: int, float_precision: int = 4, alignment: str = None) str
Formats SQLite SELECT clauses based on column parameters to provide preformatted fetches, providing most of the formatting for
repr
output.- Parameters:
column (str) – The name of the column.
dtype (str) – The data type of the column (‘float’, ‘int’, ‘index’, or other).
max_pad_width (int) – The maximum width to pad the output.
float_precision (int, optional) – The precision for floating-point numbers (default is 4).
alignment (str, optional) – The alignment of the output (‘<’, ‘>’, or None for no alignment).
- Returns:
The formatted SELECT clause for SQLite.
- Return type:
str
Note
This function generates SQLite SELECT clauses for single column only.
The output preformats SELECT result to fit
repr
method for tabular output.The return
str
is not valid SQL by itself, representing only the single column select portion.
- static_py_to_sql_map_dict
The data type mapping to use when converting python types to SQL column types.
- Type:
dict
- static_sql_to_py_map_dict
The data type mapping to use when converting SQL column types to python types.
- Type:
dict
- strip(characters: str = None, str_dtype_only: bool = True, inplace: bool = False) SQLDataModel | None
Removes the specified characters from the beginning and end of each value in the current
SQLDataModel
removing leading and trailing whitespace characters by default.- Parameters:
characters (str, optional) – The characters to remove from both ends of the value. Default is None, removing whitespace (
' '
,'\t'
,'\n'
,'\r'
).str_dtype_only (bool, optional) – If True, only columns with dtype = ‘str’ are stripped, otherwise all columns are stripped. Default is True.
inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.
- Raises:
TypeError – If
characters
argument is provided and is not of type'str'
representing unordered characters to remove.- Returns:
If
inplace=False
, returns a new SQLDataModel with the stripped values. Otherwise modifies the current instance in-place returning None.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create a single item model sdm = SQLDataModel([[' Hello, World! ']]) # Strip whitespace and print print(sdm.strip())
This will output the model after stripping the leading and trailing whitespace characters:
┌───┬───────────────┐ │ │ 0 │ ├───┼───────────────┤ │ 0 │ Hello, World! │ └───┴───────────────┘ [1 rows x 1 columns]
Non-whitespace characters can also be stripped:
from SQLDataModel import SQLDataModel headers = ['Col A', 'Col B', 'Col C'] data = [ ['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3'] ] # Create the sample model sdm = SQLDataModel(data, headers) # Strip leading and trailing 'A' character sdm_stripped = sdm.strip('A') # View result print(sdm_stripped)
This will output a new model where any leading and trailing ‘A’ characters have been removed:
┌───────┬───────┬───────┐ │ Col A │ Col B │ Col C │ ├───────┼───────┼───────┤ │ 1 │ B1 │ C1 │ │ 2 │ B2 │ C2 │ │ 3 │ B3 │ C3 │ └───────┴───────┴───────┘ [3 rows x 3 columns]
Multiple characters can be stripped, and the model modified inplace:
# Strip multiple characters and this time modify model inplace sdm.strip('123', inplace=True) # View result print(sdm)
This will output the modified model after stripping leading and trailing ‘123’ characters:
┌───────┬───────┬───────┐ │ Col A │ Col B │ Col C │ ├───────┼───────┼───────┤ │ A │ B │ C │ │ A │ B │ C │ │ A │ B │ C │ └───────┴───────┴───────┘ [3 rows x 3 columns]
Note
For string replacement instead of string removal, see
SQLDataModel.replace()
.When using
str_dtype_only = False
, numeric values may be modified due to SQLite’s type affinity rules.This method is equivalent to the SQLite
trim(string, character)
function, wrapping and passing the equivalent arguments.
- table_style
The table style used for string representations of the model. Available styles are
'ascii'
,'bare'
,'dash'
,'default'
,'double'
,'list'
,'markdown'
,'outline'
,'pandas'
,'polars'
,'postgresql'
or'round'
. Defaults to'default'
table style.- Type:
str
- tail(n_rows: int = 5) SQLDataModel
Returns the last
n_rows
of the currentSQLDataModel
.- Parameters:
n_rows (int, optional) – Number of rows to return. Defaults to 5.
- Raises:
TypeError – If
n_rows
argument is not of type ‘int’ representing the number of rows to return from the tail of the model.- Returns:
A new
SQLDataModel
instance containing the specified number of rows.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Countries data available for sample dataset url = 'https://developers.google.com/public-data/docs/canonical/countries_csv' # Create the model sdm = SQLDataModel.from_html(url) # Get tail of model sdm_tail = sdm.tail() # View it print(sdm_tail)
This will grab the bottom 5 rows by default:
┌─────┬─────────┬──────────┬───────────┬───────────────────┐ │ │ country │ latitude │ longitude │ name │ ├─────┼─────────┼──────────┼───────────┼───────────────────┤ │ 240 │ WF │ -13.7688 │ -177.1561 │ Wallis and Futuna │ │ 241 │ EH │ 24.2155 │ -12.8858 │ Western Sahara │ │ 242 │ YE │ 15.5527 │ 48.5164 │ Yemen │ │ 243 │ ZM │ -13.1339 │ 27.8493 │ Zambia │ │ 244 │ ZW │ -19.0154 │ 29.1549 │ Zimbabwe │ └─────┴─────────┴──────────┴───────────┴───────────────────┘ [5 rows x 4 columns]
Note
See related
SQLDataModel.head()
for the opposite, grabbing the topn_rows
from the current model.
- to_csv(filename: str = None, delimiter: str = ',', quotechar: str = '"', lineterminator: str = '\r\n', na_rep: str = 'None', encoding: str = 'utf-8', index: bool = False, **kwargs) str | None
Writes
SQLDataModel
to the specified file iffilename
argument if provided, otherwise returns the model directly as a CSV formatted string literal.- Parameters:
filename (str) – The name of the CSV file to which the data will be written. Default is None, returning as raw literal.
delimiter (str, optional) – The delimiter to use for separating values. Default is ‘,’.
quotechar (str, optional) – The character used to quote fields. Default is ‘”’.
lineterminator (str, optional) – The character used to terminate the row and move to a new line. Default is ‘rn’.
na_rep (str, optional) – String representation to use for null or missing values. Default is ‘None’.
encoding (str, optional) – The encoding to use when writing the model to a CSV file. Default is ‘utf-8’.
index (bool, optional) – If True, includes the index in the CSV file; if False, excludes the index. Default is False.
**kwargs – Additional arguments to be passed to the
csv.writer
constructor.
- Returns:
If
filename
is None, returns the model as a delimited string literal,None
iffilename
is provided, writing the model to the specified file as a CSV file.- Return type:
str
|None
Example:
Returning CSV Literal
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers) # Generate the literal using tab delimiter csv_literal = sdm.to_csv(delimiter='\t') # View output print(csv_literal)
This will output:
Name Age Height John 30 175.3 Alice 28 162.0 Travis 35 185.8
Write to File
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers) # CSV filename csv_file = 'persons.csv' # Write to the file, keeping the index sdm.to_csv(filename=csv_file, index=True)
Contents of
persons.csv
:idx,Name,Age,Height 0,John,30,175.3 1,Alice,28,162.0 2,Travis,35,185.8
- Change Log:
- Version 0.6.4 (2024-05-17):
Added
encoding
parameter to pass to file handler when writing contents as CSV file and set default toutf-8
to align with expected SQLite codec.
- Version 0.4.0 (2024-04-23):
Modified quoting behavior to avoid redundant quoting and to closely mimic csv module from standard library.
Added
na_rep
to fill null or missing values when generating output, useful for space delimited data and minimal quoting.
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
When
index=True
, thesdm_index
property determines the column name of the index in the result.Modifying
delimiter
affects how the data is delimited when writing tofilename
and when returning as raw literal, any valid delimiter can be used.Quoting behavior can be modified by providing an additional keywork arg such as
quoting=1
to wrap all values in quotes, orquoting=2
to quote only non-numeric values, seecsv.QUOTE_X
enums for all options.Use
SQLDataModel.to_text()
to pretty print table in specified style for visualizing output if strict delimiting is unnecessary.See
SQLDataModel.from_csv()
for creating a newSQLDataModel
from existing CSV data
- to_dict(orient: Literal['rows', 'columns', 'list'] = 'rows', index: bool = None) dict | list[dict]
Converts the
SQLDataModel
instance to a dictionary or a list of dictionaries based on the specified orientation.- Parameters:
orient (Literal["rows", "columns", "list"]) – The orientation of the output, see examples for more detail.
"rows"
: Returns a dictionary with index values as keys and row values as values."columns"
: Returns a dictionary with column names as keys and column values as tuples."list"
: Returns a list of dictionaries, where each dictionary represents a row.index (bool) – Whether to include the index column in the output. Defaults to the display_index property.
- Raises:
ValueError – if value for
orient
is not one of “rows”, “columns” or “list”.- Returns:
The converted data structure based on the specified orientation.
- Return type:
dict
|list[dict]
Examples:
Orient by Rows
from SQLDataModel import SQLDataModel # Sample data headers = ['Col A','Col B', 'Col C'] data = [ ['A,0', 'A,1', 'A,2'], ['B,0', 'B,1', 'B,2'], ['C,0', 'C,1', 'C,2'] ] # Create the model sdm = SQLDataModel(data, headers) # Convert to dictionary with rows as keys and values rows_dict = sdm.to_dict(orient="rows") # View output for k, v in rows_dict.items(): print(f"{k}: {v}")
This will output:
0: ('A,0', 'B,0', 'C,0') 1: ('A,1', 'B,1', 'C,1') 2: ('A,2', 'B,2', 'C,2')
Orient by Columns
# Convert to dictionary with columns as keys and rows as values columns_dict = sdm.to_dict(orient="columns") # View output for k, v in columns_dict.items(): print(f"{k}: {v}")
This will output:
Col A: ('A,0', 'A,1', 'A,2') Col B: ('B,0', 'B,1', 'B,2') Col C: ('C,0', 'C,1', 'C,2')
Orient by List
# Convert to list of dictionaries with each dictionary representing a row with columns as keys list_dict = sdm.to_dict(orient="list") # View output for row in list_dict: print(row)
This will output:
{'Col A': 'A,0', 'Col B': 'B,0', 'Col C': 'C,0'} {'Col A': 'A,1', 'Col B': 'B,1', 'Col C': 'C,1'} {'Col A': 'A,2', 'Col B': 'B,2', 'Col C': 'C,2'}
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
Use
index
to return index data, otherwise current instancedisplay_index
value will be used.For
'list'
orientation, data returned is JSON-like in structure, where each row has its own “column”: “value” data.
- to_excel(filename: str, worksheet: int | str = 0, index: bool = False) None
Writes the current
SQLDataModel
to the specified Excelfilename
.- Parameters:
filename (str) – The file path to save the Excel file, e.g.,
filename = 'output.xlsx'
.worksheet (int | str, optional) – The index or name of the worksheet to write to. Defaults to 0, indicating the first worksheet.
index (bool, optional) – If
SQLDataModel
index should be included in the output. Default is False.
- Raises:
ModuleNotFoundError – If the required package
openpyxl
is not installed as determined by_has_xl
flag.TypeError – If the
filename
argument is not of type ‘str’ representing a valid Excel file path to create or write to.IndexError – If
worksheet
is provided as type ‘int’ but is out of range of the available worksheets.Exception – If any unexpected exception occurs during the Excel writing and saving process.
- Returns:
If successful, a new Excel file
filename
is created andNone
is returned.- Return type:
None
Example:
import openpyxl from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Rate', 'Gender'] data = [ ('Alice', 25, 26.50, 'Female'), ('Bob', 30, 21.25, 'Male'), ('Will', 35, 24.00, 'Male'), ('Mary', 32, 23.75, 'Female') ] # Create the model sdm = SQLDataModel(data, headers) # Export into a new Excel file sdm.to_excel('Team-Overview.xlsx') # Or append to existing Excel file as a new worksheet sdm.to_excel('Team.xlsx', worksheet='Demographics')
This will create a new Excel file
Team-Overview.xlsx
:┌───────┬──────┬────────┬────────┐ │ A │ B │ C │ D │ ┌───┼───────┼──────┼────────┼────────┤ │ 1 │ Name │ Age │ Gender │ Rate │ │ 2 │ Alice │ 25 │ Female │ 26.50 │ │ 3 │ Mary │ 32 │ Female │ 23.75 │ │ 4 │ Bobby │ 30 │ Male │ 21.25 │ │ 5 │ Will │ 35 │ Male │ 24.00 │ └───┴───────┴──────┴────────┴────────┘ [ Sheet1 ]
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
If provided
filename
does not exist, it will be created. If it already exists, model data will be appended to the existing worksheet contents.When providing a string argument for
worksheet
, if the sheet does not exist, it will be created. However if providing an integer index for an out of range sheet, anIndexError
will be raised.See related
SQLDataModel.from_excel()
for creating aSQLDataModel
from existing Excel content.
- to_html(filename: str = None, index: bool = None, encoding: str = 'utf-8', style_params: dict = None) str
Returns the current SQLDataModel as a lightly formatted HTML <table> element as a string if
filename
is None. Iffilename
is specified, writes the HTML to the specified file as .html and returns None.- Parameters:
filename (str) – The file path to save the HTML content. If None, returns the HTML as a string (default is None).
index (bool) – Whether to include the index column in the HTML table (default is current
display_index
).encoding (str) – Character encoding to use when writing model to HTML file, default set to
'utf-8'
.style_params (dict) – A dictionary representing CSS styles {property: value} to customize the appearance of the HTML table (default is None).
- Raises:
TypeError – If
filename
is not a valid string when specified or ifstyle_params
is not a dictionary when specified.OSError – If encountered while trying to open and write the HTML to the file.
- Returns:
If
filename
is None, returns the HTML content as a string. Iffilename
is specified, writes to the file and returns None.- Return type:
str
|None
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel(data=[(1, 'John'), (2, 'Doe')], headers=['ID', 'Name']) # Create and save as new html file sdm.to_html('output.html', style_params={'font-size': '12pt'}) # Get HTML as a string html_string = sdm.to_html() # View output print(html_string)
This will output:
<table> <tr> <th>ID</th> <th>Name</th> </tr> <tr> <td>1</td> <td>John</td> </tr> <tr> <td>2</td> <td>Doe</td> </tr> </table> <style> table {font:size: 12pt;} </style>
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
Base styles are applied to reflect the styling of
SQLDataModel
in the terminal, including anydisplay_color
which is applied to the table CSS.Table index is determined by the instance
display_index
attribute unless specified in the argument of the same name, overriding the instance attribute.The default background-color is #E5E5E5, and the default font color is #090909, with 1 px solid border to mimic the
repr
for the instance.
- to_json(filename: str = None, index: bool = None, **kwargs) list | None
Converts the
SQLDataModel
instance to JSON format. Iffilename
is specified, the JSON is written to the file; otherwise, a JSON-like object is returned.- Parameters:
filename (str) – The path to the file where JSON will be written. If None, no file is created and JSON-like object is returned.
index (bool) – Whether to include the index column in the JSON. Defaults to the
display_index
property.**kwargs – Additional keyword arguments to pass to the json.dump() method.
- Raises:
TypeError – If
filename
is not of type ‘str’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename
.
- Returns:
If
filename
is None, a list containing a JSON-like object is returned. Otherwise JSON file created and returnsNone
.- Return type:
list
|None
Examples:
To JSON Literal
from SQLDataModel import SQLDataModel # Sample JSON to first create model json_source = [ {"id": 1, "color": "red", "value": "#f00", "notes": "primary"} ,{"id": 2, "color": "green", "value": "#0f0", "notes": None} ,{"id": 3, "color": "blue", "value": "#00f", "notes": "primary"} ] # Create the model sdm = SQLDataModel.from_json(json_source) # View current state print(sdm)
This will output:
┌─────┬───────┬───────┬─────────┐ │ id │ color │ value │ notes │ ├─────┼───────┼───────┼─────────┤ │ 1 │ red │ #f00 │ primary │ │ 2 │ green │ #0f0 │ │ │ 3 │ blue │ #00f │ primary │ └─────┴───────┴───────┴─────────┘ [3 rows x 4 columns]
Write JSON File
# Write model to JSON file sdm.to_json('output.json') # Or convert to JSON-like object json_data = sdm.to_json() # View JSON object print(json_data)
This will output:
[{ "id": 1, "color": "red", "value": "#f00", "notes": "primary" }, { "id": 2, "color": "green", "value": "#0f0", "notes": null }, { "id": 3, "color": "blue", "value": "#00f", "notes": "primary" }]
- Change Log:
- Version 0.3.2 (2024-04-02):
Changed return object to JSON string literal when
filename=None
to convert to valid literal object.
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
When no filename is specified, JSON-like object will be returned as a rowwise array.
Any nested structure will be flattened by this method as well as the
SQLDataModel.from_json()
method.
- to_latex(filename: str = None, index: bool = False, bold_headers: bool = False, min_column_width: int = None, max_column_width: int = None, format_output_as: Literal['table', 'document'] = 'table', column_alignment: Literal['left', 'center', 'right', 'dynamic'] = None) str | None
Returns the current
SQLDataModel
as a LaTeX table string iffilename
is None, otherwise writes the table to the provided file as a LaTeX document.- Parameters:
filename (str, optional) – The name of the file to write the LaTeX content. If not provided, the LaTeX content is returned as a string. Default is None.
index (bool, optional) – Whether to include the index column in the LaTeX output. Default is False.
bold_headers (bool, optional) – Whether the headers should be bolded in the LaTeX table. Default is False.
min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on attribute
SQLDataModel.min_column_width
.max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on attribute
SQLDataModel.max_column_width
.format_output_as (Literal['table', 'document']), optional) – Whether the output should be formatted as a LaTeX table or as a standalone document. Default is ‘table’.
column_alignment (Literal['left', 'center', 'right', 'dynamic'], optional) – The alignment for table columns. Default is current value set on attribute
SQLDataModel.column_alignment
.
- Returns:
If
filename
is None, returns the LaTeX table as a string.None
: Iffilename
is provided, writes the LaTeX table to the specified file and returns None.- Return type:
str
- Raises:
TypeError – If the
filename
argument is not of type ‘str’,index
argument is not of type ‘bool’,min_column_width
ormax_column_width
argument is not of type ‘int’.ValueError – If
format_output_as
is not one of ‘table’, ‘document’, orcolumn_alignment
provided and is not one of ‘left’, ‘center’, ‘right’, ‘dynamic’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename
.
- LaTeX Formatting:
LaTeX output format that is generated can be set by
format_output_as
which provides one of two formats:'table'
: Output formatted as insertable table, beginning and ending with LaTeX\begin{table}
and\end{table}
respectively.'document'
: Output formatted as standalone document, beginning and ending with LaTeX\begin{document}
and\end{document}
respectively.
LaTeX table alignment will follow the
SQLDataModel
instance alignment, set bySQLDataModel.set_column_alignment()
:'dynamic'
: Dynamically aligns column content, right for numeric types and left for remaining types.'left'
: Left-aligns all column content, equivalent to LaTeX column format:|l|
.'center'
: Center-aligns all column content preferring left on uneven splits, equivalent to LaTeX column format:|c|
.'right'
: Right-aligns all column content, equivalent to LaTeX column format:|r|
.
The LaTeX rows generated will use
dynamic
alignment regardless ofcolumn_alignment
provided, this will not affect the rendered alignment but will maintain consistent format without affecting the actual alignment rendered by LaTeX.
Examples:
Returning LaTeX Literal
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model sdm = SQLDataModel(data=data, headers=headers) # Generate LaTeX table literal latex_output = sdm.to_latex() # View LaTeX output print(latex_output)
This will output:
\begin{tabular}{|l|r|r|} \hline {Name} & {Age} & {Height} \ \hline John & 30 & 175.30 \ Alice & 28 & 162.00 \ Michael & 35 & 185.80 \ \hline \end{tabular}
Write to LaTeX File
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model sdm = SQLDataModel(data=data, headers=headers) # Write the output to the file, formatting the output as a proper LaTeX document latex_table = sdm.to_latex(filename='Table.tex', format_output_as='document')
Contents of file
Table.tex
:\documentclass{article} \begin{document} \begin{table}[h] \centering \begin{tabular}{|l|r|r|} \hline {Name} & {Age} & {Height} \ \hline John & 30 & 175.30 \ Alice & 28 & 162.00 \ Michael & 35 & 185.80 \ \hline \end{tabular} \end{table} \end{document}
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
A
\centering
command is included in the LaTeX output by default regardless of alignments specified.LaTeX headers and rows are indented by four spaces to keep with conventional table syntax and to distinguish the table data from commands.
- to_list(index: bool = False, include_headers: bool = False) list
Returns the current
SQLDataModel
data as a 1-dimensional list of values if data dimensions are compatible with flattening, or as a list of lists if data is 2-dimensional. Data is returned without index or headers by default, useindex = True
orinclude_headers = True
to modify.- Parameters:
index (bool, optional) – If True, includes the index in the result, if False, excludes the index. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result, if False, excludes headers. Default is False.
- Returns:
The flattened list of values corresponding to the model data.
- Return type:
list
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('Beth', 27, 172.4), ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers) # Get all model data as a list of lists model_data = sdm.to_list() # Iterate over each row for row in model_data: print(row)
This will output:
['Beth', 27, 172.4] ['John', 30, 175.3] ['Alice', 28, 162.0] ['Travis', 35, 185.8]
Data will be flattened into a single dimension if possible, such as when accessing individual columns:
# Get 'Name' column as a list col_data = sdm['Name'].to_list() # View output print(col_data)
This will output a list containing the values from each row for the column:
['Beth', 'John', 'Alice', 'Travis']
Data will also be flattened when accessing individual rows:
# Get first row as a list with index row_data = sdm[0].to_list(index=True) # View result print(row_data)
This will output the row’s values including the index:
[0, 'Beth', 27, 172.4]
- Change Log:
- Version 0.5.0 (2024-05-09):
Modified behavior to output 1-dimensional list when possible and a list of lists when not possible.
Changed default to
index = False
to increase surface for 1-dimensional flattening.
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
See
SQLDataModel.data()
to return the equivalent ofcursor.fetchall()
with data as a list of tuples.See
SQLDataModel.iter_rows()
to generate an iterable over the model data, which is preferred wherever possible.
- to_local_db(filename: str) None
Writes the
SQLDataModel
in-memory database to disk as a SQLite database file using the specified filename.- Parameters:
filename (str) – The filename or filepath to use when writing the model to disk.
- Raises:
TypeError – If
filename
is provided and is not of type ‘str’ representing a valid sqlite database save path.sqlite3.Error – If there is an issue with the SQLite database operations during backup.
- Returns:
None
Example:
import sqlite3 from SQLDataModel import SQLDataModel # Sample data data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')] # Create the model sdm = SQLDataModel(data, headers=['Name','Age','Sex']) # Filename to use for database db_file = "model.db" # Write the in-memory database model to disk sdm.to_local_db(db_file) # Loading the model back from disk can now be done at anytime sdm = SQLDataModel.from_sql("sdm", sqlite3.connect(db_file)) # View restored model print(sdm)
This will output the model we originally created:
┌───┬───────┬─────┬─────┐ │ │ Name │ Age │ Sex │ ├───┼───────┼─────┼─────┤ │ 0 │ Alice │ 20 │ F │ │ 1 │ Billy │ 25 │ M │ │ 2 │ Chris │ 30 │ M │ └───┴───────┴─────┴─────┘ [3 rows x 3 columns]
- Change Log:
- Version 0.5.2 (2024-05-13):
Renamed
db
parameter tofilename
for package consistency and to avoid confusion between similarily named database objects.Changed
filename
from keyword to positional argument making it a required parameter to avoid accidental overwriting.
Note
Use any compatible SQL API to load the resulting database file or use
SQLDataModel.from_sql()
to reload it back into aSQLDataModel
.Table name is determined by value at
SQLDataModel.sql_model
which is set to'sdm'
by default, useSQLDataModel.set_model_name()
to modify.
- to_markdown(filename: str = None, index: bool = False, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None) str | None
Returns the current
SQLDataModel
as a markdown table literal iffilename
is None, otherwise writes the table to the provided file as markdown.- Parameters:
filename (str, optional) – The name of the file to write the Markdown content. If not provided, the Markdown content is returned as a string. Default is None.
index (bool, optional) – Whether to include the index column in the Markdown output. Default is False.
min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on
SQLDataModel.min_column_width
.max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on
SQLDataModel.max_column_width
.float_precision (int, optional) – The precision for floating-point values. Default is current value set on
SQLDataModel.display_float_precision
.column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – The alignment for table columns. Default is current value set on
SQLDataModel.column_alignment
.'dynamic'
: Dynamically aligns column content, right for numeric types and left for remaining types.'left'
: Left-aligns all column content.'center'
: Center-aligns all column content preferring left on uneven splits.'right'
: Right-aligns all column content.
- Raises:
TypeError – If the
filename
argument is not of type ‘str’,index
argument is not of type ‘bool’,min_column_width
ormax_column_width
argument is not of type ‘int’.ValueError – If the
column_alignment
argument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename
.
- Returns:
If
filename
is None, returns the Markdown table as a string, iffilename
is provided, writes the Markdown table to the specified file and returns None.- Return type:
str
orNone
- Column Alignment:
'dynamic'
: Dynamically aligns column content, right for numeric types and left for remaining types.'left'
: Left-aligns all column content.'center'
: Center-aligns all column content preferring left on uneven splits.'right'
: Right-aligns all column content.
Examples:
To Markdown Literal
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model sdm = SQLDataModel(data=data, headers=headers) # Generate markdown table literal markdown_table = sdm.to_markdown() # View markdown output print(markdown_table)
This will output:
| Name | Age | Height | |:--------|-----:|--------:| | John | 30 | 175.30 | | Alice | 28 | 162.00 | | Michael | 35 | 185.80 |
Write to Markdown File
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model sdm = SQLDataModel(data=data, headers=headers) # Write the output to the file, center-aligning all columns sdm.to_markdown(filename='Table.MD', column_alignment='center')
Contents of
Table.MD
:| Name | Age | Height | |:--------|-----:|--------:| | John | 30 | 175.30 | | Alice | 28 | 162.00 | | Michael | 35 | 185.80 |
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
All markdown output will contain the alignment characters
':'
as determined by theSQLDataModel.column_alignment
attribute or parameter.Any exception encountered during file read or writing operations is caught and reraised, see related
SQLDataModel.from_markdown()
.
- to_numpy(index: bool = False, include_headers: bool = False) ndarray
Converts
SQLDataModel
to a NumPyndarray
object of shape(rows, columns)
. Note that thenumpy
package must be installed to use this method.- Parameters:
index (bool, optional) – If True, includes the model index in the result. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result. Default is False.
- Raises:
ModuleNotFoundError – If NumPy is not installed.
- Returns:
The model’s data converted into a NumPy array.
- Return type:
numpy.ndarray
Example:
import numpy from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the numpy array with default parameters, no indicies or headers result_array = sdm.to_numpy() # View array print(result_array)
This will output:
[['John' '30' '175.3'] ['Alice' '28' '162.0'] ['Travis' '35' '185.8']]
Model headers can also be retained:
# Create the numpy array with with indicies and headers result_array = sdm.to_numpy(index=True, include_headers=True) # View array print(result_array)
This will output:
[['idx' 'Name' 'Age' 'Height'] ['0' 'John' '30' '175.3'] ['1' 'Alice' '28' '162.0'] ['2' 'Travis' '35' '185.8']]
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
Output will always be a 2-dimensional array of type
numpy.ndarray
- to_pandas(index: bool = False, include_headers: bool = True) DataFrame
Converts
SQLDataModel
to a PandasDataFrame
object. Note that thepandas
package must be installed to use this method.- Parameters:
index (bool, optional) – If True, includes the model index in the result. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result. Default is True.
- Raises:
ModuleNotFoundError – If Pandas is not installed.
- Returns:
The model’s data converted to a Pandas DataFrame.
- Return type:
pandas.DataFrame
Example:
import pandas from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers) # Convert the model to a pandas df df = sdm.to_pandas(include_headers=True, index=True) # View result print(df)
This will output:
Name Age Height 0 John 30 175.3 1 Alice 28 162.0 2 Travis 35 185.8
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
SQLDataModel uses different data types than those used in
pandas
, seeSQLDataModel.set_column_dtypes()
for more information about casting rules.
- to_parquet(filename: str, **kwargs) None
Writes the current SQLDataModel to the specified parquet filename.
- Parameters:
filename (str) – The file path to save the parquet file, e.g.,
filename = 'user/data/output.parquet'
.**kwargs – Additional keyword arguments to pass to the pyarrow
write_table
function.
- Raises:
ModuleNotFoundError – If the required package
pyarrow
is not installed as determined by_has_pa
flag.TypeError – If the
filename
argument is not of type ‘str’ representing a valid parquet file path.Exception – If any unexpected exception occurs during the parquet writing process.
- Returns:
If successful, a new parquet file
filename
is created andNone
is returned.- Return type:
None
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Rate'] data = [('Alice', 25, 26.50), ('Bob', 30, 21.25), ('Will', 35, 24.00)] # Create the model sdm = SQLDataModel(data,headers, display_index=False) # Parquet file pq_file = "output.parquet" # Write the model as parquet file sdm.to_parquet(pq_file) # Confirm result by reading back file sdm_result = SQLDataModel.from_parquet(pq_file) # View model print(sdm_result)
This will output:
┌───────┬──────┬────────┐ │ Name │ Age │ Rate │ ├───────┼──────┼────────┤ │ Alice │ 25 │ 26.50 │ │ Bob │ 30 │ 21.25 │ │ Will │ 35 │ 24.00 │ └───────┴──────┴────────┘ [3 rows x 3 columns]
Note
The
pyarrow
package is required to use this method as well as theSQLDataModel.from_parquet()
method.The
SQLDataModel.to_dict()
method is used prior to writing to parquet to convert theSQLDataModel
into a dictionary suitable for parquet Table format.Exceptions raised by the
pyarrow
package and its methods are caught and reraised when encountered to keep with package error formatting.
- to_pickle(filename: str = None) None
Save the
SQLDataModel
instance to the specifiedfilename
as a pickle object.- Parameters:
filename (str, optional) – The file name to save the model to. If None, the invoking Python file’s name with a “.sdm” extension will be used.
- Raises:
TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the SQLDataModel object sdm = SQLDataModel(data, headers) # Save the model's data as a pickle file "output.sdm" sdm.to_pickle("output.sdm") # Alternatively, leave blank to use the current file's name: sdm.to_pickle() # This way the same data can be recreated later by calling the from_pickle() method from the same project: sdm = SQLDataModel.from_pickle()
Note
All data, headers, data types and display properties will be saved when pickling.
If no
filename
argument is provided, then the invoking module’s__name__
property will be used by default.
- to_polars(index: bool = False, include_headers: bool = True) _pl.DataFrame
Converts
SQLDataModel
to a PolarsDataFrame
object. Note that thepolars
package must be installed to use this method.- Parameters:
index (bool, optional) – If True, includes the model index in the result. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result. Default is True.
- Raises:
ModuleNotFoundError – If Polars is not installed.
- Returns:
The model’s data converted to a Polars DataFrame.
- Return type:
polars.DataFrame
Example:
import polars from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('Beth', 27, 172.4), ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model sdm = SQLDataModel(data, headers) # Convert the model to a polars df with the index df = sdm.to_polars(index=True) # View result print(df)
This will output:
shape: (4, 4) ┌─────┬────────┬─────┬────────┐ │ idx ┆ Name ┆ Age ┆ Height │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 ┆ f64 │ ╞═════╪════════╪═════╪════════╡ │ 0 ┆ Beth ┆ 27 ┆ 172.4 │ │ 1 ┆ John ┆ 30 ┆ 175.3 │ │ 2 ┆ Alice ┆ 28 ┆ 162.0 │ │ 3 ┆ Travis ┆ 35 ┆ 185.8 │ └─────┴────────┴─────┴────────┘
Note
See related
SQLDataModel.from_polars()
for the inverse method of converting a PolarsDataFrame
object into to aSQLDataModel
.SQLDataModel uses different data types than those used in
polars
, seeSQLDataModel.set_column_dtypes()
for more information about casting rules.Polars does not really have a concept of an index column, therefore when using
index=True
, the SQLDataModel index is just an additional column in the returned DataFrame object.
- to_pyarrow(index: bool = False) Table
Returns the current
SQLDataModel
in Apache Arrow columnar format as apyarrow.Table
.- Parameters:
index (bool, optional) – Specifies whether to include the index of the SQLDataModel in the resulting Table. Default is to False.
- Raises:
ModuleNotFoundError – If the required package
pyarrow
is not installed.Exception – If any unexpected exception occurs during the pyarrow conversion process.
- Returns:
A table representing the current
SQLDataModel
in Apache Arrow columnar format.- Return type:
pyarrow.Table
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)] # Create the model sdm = SQLDataModel(data, headers) # Create the pyarrow table table = sdm.to_pyarrow() # View result print(table)
This will output the
pyarrow
object details:pyarrow.Table Name: string Age: int64 Grade: double ---- Name: [["Alice","Bob","Charlie"]] Age: [[25,30,35]] Grade: [[3.8,3.9,3.2]]
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
Unmodified python types will follow conversion and casting rules specified in
pyarrow
implementation, for the modifieddate
anddatetime
types,date32[day]
andtimestamp[us]
will be used, respectively.
- to_sql(table: str, con: Connection | Any, *, schema: str = None, if_exists: Literal['fail', 'replace', 'append'] = 'fail', index: bool = True, primary_key: str | int = None) None
Insert the
SQLDataModel
into the specified table using the provided database connection.- Supported Connection APIs:
SQLite using
sqlite3
PostgreSQL using
psycopg2
SQL Server ODBC using
pyodbc
Oracle using
cx_Oracle
Teradata using
teradatasql
- Parameters:
table (str) – The name of the table where data will be inserted.
con (sqlite3.Connection | Any) – The database connection object. Supported connection APIs are
sqlite3
,psycopg2
,pyodbc
,cx_Oracle
,teradatasql
schema (str, optional) – The schema to use for PostgreSQL and ODBC SQL Server connections, ignored otherwise. Default is None.
if_exists (Literal['fail', 'replace', 'append'], optional) – Action to take if the table already exists. If
fail
an error is raised if table exists and no inserts occur. Ifreplace
any existing table is dropped prior to inserts. Ifappend
existing table is appended to by subsequent inserts.index (bool, optional) – If the model index should be included in the target table. Default is True.
primary_key (str | int, optional) – Column name or index to use as table primary key. Default is None, using the index column as the primary key when
index=True
.
- Raises:
SQLProgrammingError – If an error occurs during cursor accessing, table creation or data insertion into the database.
ValueError – If specified
table
already exists when usingif_exists='fail'
or ifcon
is not one of the currently supported connection modules.IndexError – If
primary_key
is provided as anint
representing a column index but is out of range of the current modelSQLDataModel.column_count
.TypeError – If
primary_key
argument provided is not of type ‘str’ or ‘int’ representing a valid column name or index to use as the primary key column for the target table.
- Returns:
None
Example:
import sqlite3 from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)] # Create the model sdm = SQLDataModel(data, headers) # Create connection object sqlite_db_conn = sqlite3.connect('students.db') # Basic usage, creating a new table sdm.to_sql('users', sqlite_db_conn)
This will create a new table
users
, or fail if one already exists:sqlite> select * from users; idx Name Age Grade --- ------- --- ----- 0 Alice 25 3.8 1 Bob 30 3.9 2 Charlie 35 3.2 3 David 28 3.4
Connect to PostgreSQL, SQL Server, Oracle or Teradata:
import psycopg2 from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)] # Create the model sdm = SQLDataModel(data, headers) # Setup the connection, whether using psycopg2 or other supported modules like pyodbc con = psycopg2.connect(...) # Create or replace existing table in database sdm.to_sql('users', con, if_exists='replace', index=False)
This will result in a new table
users
in our PostgreSQL database:=> select * from users; Name | Age | Grade | --------+-----+-------+ Alice | 25 | 3.8 | Bob | 30 | 3.9 | Charlie | 35 | 3.2 | David | 28 | 3.4 |
For SQL Server connections using
pyodbc
, the example would be almost identical except for whichcon
object we use:import pyodbc # For SQL Server ODBC connections using pyodbc con = pyodbc.connect(...)
The same is true for Oracle and other connections:
import cx_Oracle # For Oracle connections using cx_Oracle con = cx_Oracle.connect(...)
Using a Primary Key
from SQLDataModel import SQLDataModel # Sample data headers = ['ID', 'User'] data = [(1001, 'Alice'), (1002, 'Bob'), (1003, 'Charlie'), (1004, 'David')] # Create the model sdm = SQLDataModel(data, headers) # Create connection object sqlite_db_conn = sqlite3.connect('students.db') # Create the table using the 'ID' column as the primary key sdm.to_sql('users', sqlite_db_conn, if_exists='replace', index=False, primary_key='ID')
This will create a
users
table with the schema:sqlite> .schema users CREATE TABLE "users" ( "ID" INTEGER PRIMARY KEY, "User" TEXT);
With the
ID
column as its primary key:sqlite> select * from users; ID User ---- ------- 1001 Alice 1002 Bob 1003 Charlie 1004 David
If table creation is necessary, column types will be mapped according to the destination database by the following conversion:
┌─────────────────┬─────────┬─────────┬────────┬─────────┬────────────────┬──────┬───────────┐ │ Database \ Type │ NULL │ INTEGER │ REAL │ TEXT │ BLOB │ DATE │ TIMESTAMP │ ├─────────────────┼─────────┼─────────┼────────┼─────────┼────────────────┼──────┼───────────┤ │ PostgreSQL │ UNKNOWN │ INTEGER │ FLOAT │ TEXT │ BYTEA │ DATE │ TIMESTAMP │ │ SQL Server ODBC │ UNKNOWN │ INTEGER │ FLOAT │ TEXT │ VARBINARY(MAX) │ DATE │ DATETIME │ │ Oracle │ UNKNOWN │ NUMBER │ NUMBER │ VARCHAR │ BLOB │ DATE │ DATETIME │ │ Teradata │ UNKNOWN │ INTEGER │ FLOAT │ VARCHAR │ BYTE │ DATE │ DATETIME │ │ SQLite │ NULL │ INTEGER │ REAL │ TEXT │ BLOB │ DATE │ TIMESTAMP │ └─────────────────┴─────────┴─────────┴────────┴─────────┴────────────────┴──────┴───────────┘ [5 rows x 8 columns]
- Change Log:
- Version 0.3.0 (2024-03-31):
Renamed arguments
extern_con
:con
,replace_existing
:if_exists
,include_index
:index
.Added
primary_key
argument for specifying a primary key column for table schema.Added
schema
argument for specifying a target schema for the table.
Note
When providing a
primary_key
column it will be assumed unique and the model will not perform any unique-ness constraints.Any
con
or connection object provided is assumed to be open and valid, if a cursor cannot be created from the object an exception will be raised.Connections with write access can be used in the
SQLDataModel.to_sql()
method for writing to the same connection types, be careful.ValueError will be raised if
table
already exists, useif_exists = 'replace'
orif_exists = 'append'
to instead replace or append to the table.See relevant module documentation for additional details or information pertaining to specific database or connection dialect being used.
See related
SQLDataModel.from_sql()
for creatingSQLDataModel
from existing SQL database connections.
- to_text(filename: str = None, index: bool = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round'] = None, display_dimensions: bool = False) str | None
Returns a textual representation of the current
SQLDataModel
as a string literal or by writing to file if afilename
is provided.- Parameters:
filename (str, optional) – The name of the file to write the text content. If provided, writes the text to the specified file. Default is None.
index (bool, optional) – Whether to include the index column in the text output. Default is value set on
SQLDataModel.display_index
.min_column_width (int, optional) – The minimum column width for table cells. Default is value set on
SQLDataModel.min_column_width
.max_column_width (int, optional) – The maximum column width for table cells. Default is value set on
SQLDataModel.max_column_width
.float_precision (int, optional) – The precision for floating-point values. Default is value set on
SQLDataModel.display_float_precision
.column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – The alignment for table columns. Default is value set on
SQLDataModel.column_alignment
. Use'dynamic'
dynamically aligns column content, right for numeric types and left for remaining types. Use'left'
left-aligns all column content. Use'center'
center-aligns all column content. Use'right'
right-aligns all column content.table_style (Literal['ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql','round'], optional) – The table styling to use. Default is value set on
SQLDataModel.table_style
.display_dimensions (bool, optional) – Whether to include the model dimensions
[N rows x N cols]
in the text output. Default is False.
- Raises:
TypeError – If arguments are provided but are not the correct types:
filename
(str),index
(bool),min_column_width
(int),max_column_width
(int),float_precision
(int).ValueError – If the
column_alignment
argument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename
.
- Returns:
If
filename
is None, returns the textual representation as a string. Iffilename
is provided, writes the textual representation to the specified file and returns None.- Return type:
str
orNone
Examples:
Returning Text Literal
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model sdm = SQLDataModel(data=data, headers=headers) # Generate text table literal text_table = sdm.to_text() # View output print(text_table)
This will output:
┌─────────┬──────┬────────┐ │ Name │ Age │ Height │ ├─────────┼──────┼────────┤ │ John │ 30 │ 175.3 │ │ Alice │ 28 │ 162.0 │ │ Michael │ 35 │ 185.8 │ └─────────┴──────┴────────┘
Write to File
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model sdm = SQLDataModel(data=data, headers=headers) # Write the output to the file, center-aligning all columns sdm.to_text(filename='Table.txt', column_alignment='center')
Contents of
Table.txt
:┌───┬─────────┬──────┬────────┐ │ │ Name │ Age │ Height │ ├───┼─────────┼──────┼────────┤ │ 0 │ John │ 30 │ 175.3 │ │ 1 │ Alice │ 28 │ 162.0 │ │ 2 │ Michael │ 35 │ 185.8 │ └───┴─────────┴──────┴────────┘
Important
Unlike output from
print(sdm)
or other calls toSQLDataModel.__repr__()
, the output from this method includes the fullSQLDataModel
and is not restricted by current terminal size or the value set atSQLDataModel.display_max_rows
. As such, horizontal truncation only occurs on cell values as determined bymax_column_width
and no other horizontal or vertical table-wide truncation is performed.- Change Log:
- Version 0.3.10 (2024-04-16):
Added
table_style
parameter and updated output to reflect new formatting styles introduced in version 0.3.9.Added
display_dimensions
parameter to allow toggling display of table dimensions in output.
- Version 0.3.0 (2024-03-31):
Renamed
include_index
parameter toindex
for package consistency.
Note
See
SQLDataModel.set_table_style()
for modifying table format and available styles.If
filename
is provided, the method writes the text to the specified file; otherwise, it returns the textual representation as a string.If
index
isNone
, the method uses the current value onSQLDataModel.display_index
.If
min_column_width
isNone
, the method uses the current value onSQLDataModel.min_column_width
.If
max_column_width
isNone
, the method uses the current value onSQLDataModel.max_column_width
.If
float_precision
isNone
, the method uses the current value onSQLDataModel.display_float_precision
.If
column_alignment
isNone
, the method uses the current value onSQLDataModel.column_alignment
.If
table_style
isNone
, the method uses the current value onSQLDataModel.table_style
.
- transpose(infer_types: bool = True, include_headers: bool = False) SQLDataModel
Transposes the model and returns as a new
SQLDataModel
.- Parameters:
infer_types (bool, optional) – If types should be inferred after the transposition. Defaults to True.
include_headers (bool, optional) – If headers are included in the transposed data. Defaults to False.
- Returns:
The transposition of the model as a new SQLDataModel instance.
- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Create the model sdm = SQLDataModel([('A1', 'A2'), ('B1', 'B2'), ('C1', 'C2')]) # Transpose it sdm_transposed = sdm.transpose() # View original print(f"Original:\n{sdm}") # Along with transposed print(f"Transposed:\n{sdm_transposed}")
This will output the result of the transposition:
Original: ┌───┬─────┬─────┐ │ │ 0 │ 1 │ ├───┼─────┼─────┤ │ 0 │ A1 │ A2 │ │ 1 │ B1 │ B2 │ │ 2 │ C1 │ C2 │ └───┴─────┴─────┘ [3 rows x 2 columns] Transposed: ┌───┬─────┬─────┬─────┐ │ │ 0 │ 1 │ 2 │ ├───┼─────┼─────┼─────┤ │ 0 │ A1 │ B1 │ C1 │ │ 1 │ A2 │ B2 │ C2 │ └───┴─────┴─────┴─────┘ [2 rows x 3 columns]
Note
When
infer_types=False
, the first row of the transposed result will be used to set thedtypes
of the new model. This is generally a poor choice considering the nature of transposing data.If
include_headers=True
, the headers will be included as the first row in the transposed data.Running this method sequentially should return the original model,
sdm == sdm.transpose().transpose()
- update_index_at(row_index: int, column_index: int | str, value: Any = None) None
Updates a specific cell in the
SQLDataModel
at the given row and column indices with the provided value.- Parameters:
row_index (int) – The index of the row to be updated.
column_index (int or str) – The index or name of the column to be updated.
``value –
The new value to be assigned to the specified cell.
- Raises:
TypeError – If
row_index
is not of type ‘int’ or ifcolumn_index
is not of type ‘int’ or ‘str’.IndexError – If row or column provided as an ‘int’ but is outside of the current model row or column range.
ValueError – If column provided as a ‘str’ but is not found in the current model headers.
SQLProgrammingError – If there is an issue with the SQL execution during the update.
- Returns:
None
Example:
from SQLDataModel import SQLDataModel # Create an initial 3x3 model filled with dashes sdm = SQLDataModel.from_shape((3,3), fill='---', headers=['A', 'B', 'C']) # Update cell based on integer indicies sdm.update_index_at(0, 0, 'Top Left') sdm.update_index_at(0, 2, 'Top Right') # Update cell based on row index and column name sdm.update_index_at(2, 'A', 'Bottom Left') sdm.update_index_at(2, 'C', 'Bottom Right') # Update based on negative row and column indexing sdm.update_index_at(-2, -2, 'Center') # View result print(sdm)
This will output cumulative result of our updates:
┌───┬─────────────┬────────┬──────────────┐ │ │ A │ B │ C │ ├───┼─────────────┼────────┼──────────────┤ │ 0 │ Top Left │ --- │ Top Right │ │ 1 │ --- │ Center │ --- │ │ 2 │ Bottom Left │ --- │ Bottom Right │ └───┴─────────────┴────────┴──────────────┘ [3 rows x 3 columns]
- Change Log:
- Version 0.5.2 (2024-05-13):
Modified
row_index
parameter to useSQLDataModel.indicies
to index into rows in lieu of row index value equality.
Important
Indexing is done using zero-based integers and not done by index value. Most of the time this distinction is irrelevant as the row index at position ‘0’ will have an index value of ‘0’, however this can change after transformation operations like filter or sort. To reset and realign the index value use
SQLDataModel.reset_index()
or useSQLDataModel.indicies
to view the current row indicies.
Note
This method only updates individual cells in the current model based on integer indexing for both rows and columns using their (row, column) position.
To broadcast updates across row and column dimensions use the syntax of
sdm[row, column] = value
or seeSQLDataModel.__setitem__()
for more details.
- validate_indicies(indicies) tuple[int | slice, list[str]]
Validates and returns a predictable notation form of indices for accessing rows and columns in the
SQLDataModel
from varying indexing input types.- Row indexing:
int: Single row index.
slice: Range of row indices.
set: Discontiguous row indicies.
tuple: Like set, discontiguous row indices.
- Column indexing:
str: Single column name.
list: List of column names.
tuple[int|slice, str|list]: Two-dimensional indexing with rows and columns.
- Parameters:
indicies – Specifies the indices for rows and columns.
- Raises:
TypeError – If the type of indices is invalid such as a float for row index or a boolean for a column name index.
ValueError – If the indices are outside the current model range or if a column is not found in the current model headers when indexed by column name as
str
.IndexError – If the column indices are outside the current column range or if a column is not found in the current model headers when indexed by
int
.
- Returns:
A tuple containing validated row indices and column indices.
- Return type:
tuple
Example:
from SQLDataModel import SQLDataModel headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the sample model sdm = SQLDataModel(data,headers) # Example 1: Validate a single row index validated_row_index, validated_columns = sdm.validate_indicies(3) # Example 2: Validate a range of row indices and a list of column names validated_row_indices, validated_columns = sdm.validate_indicies((0, 2, 3), ['first', 'last']) # Example 3: Validate a slice for row indices and a single column name validated_row_indices, validated_columns = sdm.validate_indicies(slice(1, 2), 'col_3') # Example 4: Validate two-dimensional indexing with rows and columns validated_row_indices, validated_columns = sdm.validate_indicies((slice(0, 3), ['first', 'last']))
Note
For two-dimensional indexing, the first element represents rows, and the second element represents columns.
Strict validation ensures that column names are checked against the current model headers.
- vstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel
Vertically stacks one or more
SQLDataModel
objects to the current model.- Parameters:
other (SQLDataModel or sequence of) – The SQLDataModel objects to vertically stack.
inplace (bool, optional) – If True, performs the vertical stacking in-place, modifying the current model. Defaults to False, returning a new
SQLDataModel
.
- Returns:
The vertically stacked SQLDataModel instance when inplace is False.
- Return type:
SQLDataModel
- Raises:
ValueError – If no additional SQLDataModels are provided for vertical stacking.
TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.
SQLProgrammingError – If an error occurs when updating the model values in place.
Example:
from SQLDataModel import SQLDataModel # Create models A and B sdm_a = SQLDataModel([('A', 1), ('B', 2)], headers=['A1', 'A2']) sdm_b = SQLDataModel([('C', 3), ('D', 4)], headers=['B1', 'B2']) # Vertically stack B onto A sdm_ab = sdm_a.vstack(sdm_b) # View stacked model print(sdm_ab)
This will output the result of stacking B onto A, using the base model columns and dtypes:
┌─────┬─────┐ │ A1 │ A2 │ ├─────┼─────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ │ D │ 4 │ └─────┴─────┘ [4 rows x 2 columns]
Multiple models can be stacked simultaneously, here we vertically stack 3 models:
# Create a third model C sdm_c = SQLDataModel([('E', 5), ('F', 6)], headers=['C1', 'C2']) # Vertically stack all three models sdm_abc = sdm_a.vstack([sdm_b, sdm_c]) # View stacked result print(sdm_abc)
This will output the result of stacking C and B onto A:
┌─────┬─────┐ │ A1 │ A2 │ ├─────┼─────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ │ D │ 4 │ │ E │ 5 │ │ F │ 6 │ └─────┴─────┘ [6 rows x 2 columns]
Note
Headers and data types are inherited from the model calling the
SQLDataModel.vstack()
method, casting stacked values corresponding to the base model types.Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use
SQLDataModel.concat()
for strict concatenation instead of vstack.See
SQLDataModel.insert_row()
for inserting new values or types other thanSQLDataModel
directly into the current model.See
SQLDataModel.hstack()
for horizontal stacking.
- where(predicate: str) SQLDataModel
Filters the rows of the current
SQLDataModel
object based on the specified SQL predicate and returns a newSQLDataModel
containing only the rows that satisfy the condition. Only the predicates are needed as the statement prepends the select clause as “select [current model columns] where [predicate]”, see below for detailed examples.- Parameters:
predicate (str) – The SQL predicate used for filtering rows that follows the ‘where’ keyword in a normal SQL statement.
- Raises:
TypeError – If the provided
predicate
argument is not of typestr
.SQLProgrammingError – If the provided string is invalid or malformed SQL when executed against the model
- Returns:
A new
SQLDataModel
containing rows that satisfy the specified predicate.- Return type:
SQLDataModel
Example:
from SQLDataModel import SQLDataModel # Sample data headers = ['Name', 'Age', 'Job'] data = [ ('Billy', 30, 'Barber'), ('Alice', 28, 'Doctor'), ('John', 25, 'Technician'), ('Travis', 35, 'Musician'), ('William', 15, 'Student') ] # Create the model sdm = SQLDataModel(data, headers) # Filter model by 'Age' > 30 sdm_filtered = sdm.where('Age > 20') # View result print(sdm_filtered)
This will output:
┌───┬────────┬──────┬────────────┐ │ │ Name │ Age │ Job │ ├───┼────────┼──────┼────────────┤ │ 0 │ Billy │ 30 │ Barber │ │ 1 │ Alice │ 28 │ Doctor │ │ 2 │ John │ 25 │ Technician │ │ 3 │ Travis │ 35 │ Musician │ └───┴────────┴──────┴────────────┘ [4 rows x 3 columns]
Filter by multiple parameters:
# Filter by 'Job' and 'Age' sdm_filtered = sdm.where("Job = 'Student' and Age < 18") # View result print(sdm_filtered)
This will output:
┌───┬─────────┬──────┬─────────┐ │ │ Name │ Age │ Job │ ├───┼─────────┼──────┼─────────┤ │ 4 │ William │ 15 │ Student │ └───┴─────────┴──────┴─────────┘ [1 rows x 3 columns]
Note
predicate
can be any valid SQL, for example ordering can be acheived without any filtering by simple using the argument'(1=1) order by "age" asc'
If
predicate
is not of typestr
, aTypeError
is raised, if it is not valid SQL,SQLProgrammingError
will be raised.
ANSIColor
- class SQLDataModel.ANSIColor.ANSIColor(text_color: str | tuple = None, text_bold: bool = False)
Bases:
object
Creates an ANSI style terminal color using provided hex color or rgb values.
- Variables:
text_color (str or tuple) – Hex color code or RGB tuple.
text_bold (bool) – Whether text should be bold.
- Raises:
ValueError – If provided string is not a valid hex color code or if provided rgb tuple is invalid.
TypeError – If provided text_color or text_bold parameters are of invalid types.
Example:
import ANSIColor # Create a pen by specifying a color in hex or rgb: green_bold = ANSIColor("#00ff00", text_bold=True) # Create a string to use as a sample: regular_str = "Hello, World!" # Color the string using the ``wrap()`` method: green_str = green_bold.wrap(regular_str) # Print the string in the terminal to see the color applied: print(f"original string: {regular_str}, green string: {green_str}") # Get rgb values from existing color print(green_bold.to_rgb()) # Output: (0, 255, 0)
- static ErrorFormat(error: str) str
Formats an error message with ANSI color coding.
- Parameters:
error (str) – The error message to be formatted.
- Returns:
A string with ANSI color coding, highlighting the error type in bold red.
- Return type:
str
Example:
import ANSIColor # Error message to format formatted_error = ANSIColor.ErrorFormat("ValueError: Invalid value provided.") # Display alongside error or exception when raised print(formatted_error)
- __init__(text_color: str | tuple = None, text_bold: bool = False)
Initializes the ANSIColor object with the specified text color and bold setting.
- Parameters:
text_color (str or tuple) – Hex color code or RGB tuple (default: None, teal color used if not specified)
text_bold (bool) – Whether text should be bold (default: False)
Example:
import ANSIColor # Initialize from hex value with normal weight color = ANSIColor("#00ff00") # Initialize from rgb value with bold weight color = ANSIColor((0,255,0), text_bold=True)
- to_rgb() tuple
Returns the text color attribute as a tuple in the format (r, g, b).
- Returns:
RGB tuple.
- Return type:
tuple
Example:
import ANSIColor # Create the color color = ANSIColor("#00ff00") # Get the rgb values print(color.to_rgb()) # Output: (0, 255, 0)
- wrap(text: str) str
Wraps the provided text in the style of the pen.
- Parameters:
text (str) – Text to be wrapped.
- Returns:
Wrapped text with ANSI escape codes.
- Return type:
str
Example:
import ANSIColor # Create the color blue_color = ANSIColor("#0000ff") # Create a sample string message = "This string is currently unstyled" # Wrap the string to change its styling whenever its printed blue_message = blue_color.wrap(message) # Print the styled message print(blue_message) # Or style string or string object directly in the print statement print(blue_color.wrap("I'm going to turn blue!"))
HTMLParser
- class SQLDataModel.HTMLParser.HTMLParser(*, convert_charrefs: bool = True, cell_sep: str = ' ', table_identifier: int | str = 0)
Bases:
HTMLParser
An HTML parser designed to extract tables from HTML content.
This parser subclasses HTMLParser from the standard library to parse HTML content. It extracts tables from the HTML and provides methods to access the table data.
- Variables:
convert_charrefs (bool) – Flag indicating whether to convert character references to Unicode characters. Default is True.
cell_sep (str) – Separator string to separate cells within a row. Default is an empty string.
table_identifier (int or str) – Identifier used to locate the target table. It can be either an integer representing the table index, or a string representing the HTML ‘name’ or ‘id’ attribute of the table.
_in_td (bool) – Internal flag indicating whether the parser is currently inside a <td> tag.
_in_th (bool) – Internal flag indicating whether the parser is currently inside a <th> tag.
_current_table (list) – List to hold the current table being parsed.
_current_row (list) – List to hold the current row being parsed.
_current_cell (list) – List to hold the current cell being parsed.
_ignore_next (bool) – Internal flag indicating whether the next token should be ignored.
found_target (bool) – Flag indicating whether the target table has been found.
_is_finished (bool) – Internal flag indicating whether parsing is finished.
table_counter (int) – Counter to keep track of the number of tables encountered during parsing.
target_table (list) – List to hold the data of the target table once found.
- static ErrorFormat(error: str) str
Formats an error message with ANSI color coding.
- Parameters:
error (str) – The error message to be formatted.
- Returns:
A string with ANSI color coding, highlighting the error type in bold red.
- Return type:
str
Example:
import HTMLParser # Error message to format formatted_error = HTMLParser.ErrorFormat("ValueError: Invalid value provided.") # Display alongside error or exception when raised print(formatted_error)
- handle_data(data: str) None
Handle the data within an HTML tag during parsing.
- Parameters:
data (str) – The data contained within the HTML tag.
- handle_endtag(tag: str) None
Handle the end of an HTML tag during parsing and modify the parsing tags accordingly.
- Parameters:
tag (str) – The name of the HTML tag encountered.
- handle_starttag(tag: str, attrs: list[str]) None
Handle the start of an HTML tag during parsing.
- Parameters:
tag (str) – The name of the HTML tag encountered.
attrs (list[str]) – A list of (name, value) pairs representing the attributes of the tag.
- validate_table() None
Validate and retrieve the target HTML table data based on
table_identifier
used for parsing.- Returns:
A tuple containing the table data and headers (if present).
- Return type:
tuple[list, list|None]
- Raises:
ValueError – If the target table is not found or cannot be parsed.
Note
SQLDataModel.from_html()
uses this class to extract valid HTML tables from either web or file content.If a row is found with mismatched dimensions, it will be filled with
None
values to ensure tabular output.
JSONEncoder
- class SQLDataModel.JSONEncoder.DataTypesEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
JSONEncoder
Custom JSON encoder that extends the functionality of
json.JSONEncoder
to handle additional data types.- Serialization:
datetime.date
: Serialized as a string in the format ‘YYYY-MM-DD’.datetime.datetime
: Serialized as a string in the format ‘YYYY-MM-DD HH:MM:SS’.bytes
: Decoded to a UTF-8 encoded string.
Note
The date and datetime types can be recovered using SQLDataModels
infer_dtypes()
method.The bytes information is not decoded back into bytes.
- default(obj: Any)
Override the default method to provide custom serialization for specific data types.
- Parameters:
obj – The Python object to be serialized.
- Returns:
The JSON-serializable representation of the object.
StandardDeviation
- class SQLDataModel.StandardDeviation.StandardDeviation
Bases:
object
Implementation of standard deviation as an aggregate function for SQLite:
\[\sigma = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}\]- Where:
\(x_i\) represents each individual data point in the population.
\(\mu\) is the population mean.
\(N\) is the total number of data points in the population.
This class provides methods to calculate the standard deviation of a set of values in an SQLite query using the aggregate function mechanism.
- Variables:
M (float) – The running mean of the values.
S (float) – The running sum of the squared differences from the mean.
k (int) – The count of non-null input values.
Note
See
SQLDataModel.describe()
for statistical implementation.
- finalize() float
Compute the final standard deviation as part of
sqlite3
user-defined aggregate function.- Returns:
The computed standard deviation if the count is greater than or equal to 3, else None.
- Return type:
float
orNone
Note
This returns the population standard deviation, not sample standard deviation. It measures of the spread or dispersion of a set of data points within the population, using the entire population.
- step(value)
Update the running mean and sum of squared differences with a new value.
- Parameters:
value (float) – The input value to be included in the calculation.
Note
If the input value is None, it will be ignored.
Exceptions
- exception SQLDataModel.exceptions.DimensionError
Bases:
Exception
Raised when arguments provided to
SQLDataModel
are not of compatible dimensions, for example, trying to join a(10, 4)
dimensional model to a(7, 5)
dimensional model.- Variables:
message (str) – A detailed error message describing the dimension mismatch.
Example:
from SQLDataModel import SQLDataModel # Example headers and data headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the sample model with correct dimensions sdm = SQLDataModel(data,headers) # This time with one less header, which raises `DimensionError` exception: try: sdm = SQLDataModel(data,headers[:-1]) except DimensionError as e: print(e) # Attempting to assign a row with incompatible shape which also raises `DimensionError` exception: try: sdm[1] = ['sarah', 'west', 30, 'new york'] except DimensionError as e: print(e)
Note
An argument could be made for using
ValueError
instead, but there’s enough difference to justify a new error.
- exception SQLDataModel.exceptions.SQLProgrammingError
Bases:
Exception
Raised when invalid or malformed SQL prevents the execution of a method or returns unexpected behavior, for example, trying to select a column that does not exist in the current model.
- Variables:
message (str) – A detailed error message describing the SQL programming error.
Example:
from SQLDataModel import SQLDataModel # Example headers and data headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the sample model sdm = SQLDataModel(data,headers) # Query with invalid syntax to raise `SQLProgrammingError` exception: try: sdm = SQLDataModel.fetch_query("selct first, last from sdm where age > 30") except SQLProgrammingError as e: print(e) # Query for non-existing column to raise `SQLProgrammingError` exception: try: sdm = SQLDataModel.fetch_query("select first, last, date_of_birth from sdm") except SQLProgrammingError as e: print(e)
Note
This exception is used to wrap any
sqlite3.ProgrammingError
encountered during SQL related operations.