sqldatamodel
SQLDataModel
- class sqldatamodel.sqldatamodel.SQLDataModel(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = 'default')[source]
Bases:
objectSQLDataModel
Primary class for the package of the same name. Its meant to provide a fast & light-weight alternative to the common pandas, numpy and sqlalchemy setup for moving data in a source/destination agnostic manner. It is not an ORM, any modifications outside of basic joins, group bys and table alteration requires knowledge of SQL. The primary use-case envisaged by the package is one where a table needs to be ETL’d from location A to destination B with arbitrary modifications made if needed:
Summary
Extract your data from SQL, websites or HTML, parquet, JSON, CSV, pandas, numpy, pickle, python dictionaries, lists, etc.
Transform your data using raw SQL or any number of built-in methods covering some of the most used pandas data methods.
Load your data to any number of sources including popular SQL databases, CSV files, JSON, HTML, parquet, pickle, etc.
Usage
import sqldatamodel as sdm # Lets grab a random table from Wikipedia df = sdm.from_html("https://en.wikipedia.org/wiki/FIFA_World_Cup", table_identifier=7) # Lets see what we found print(df)
This will output:
┌───────────────┬──────┬──────┬──────────┬──────────┬──────┬──────┬───────┐ │ Confederation │ AFC │ CAF │ CONCACAF │ CONMEBOL │ OFC │ UEFA │ Total │ ├───────────────┼──────┼──────┼──────────┼──────────┼──────┼──────┼───────┤ │ Teams │ 43 │ 49 │ 46 │ 89 │ 4 │ 258 │ 489 │ │ Top 16 │ 9 │ 11 │ 15 │ 37 │ 1 │ 99 │ 172 │ │ Top 8 │ 2 │ 4 │ 5 │ 36 │ 0 │ 105 │ 152 │ │ Top 4 │ 1 │ 1 │ 1 │ 23 │ 0 │ 62 │ 88 │ │ Top 2 │ 0 │ 0 │ 0 │ 15 │ 0 │ 29 │ 44 │ │ 4th │ 1 │ 1 │ 0 │ 5 │ 0 │ 15 │ 22 │ │ 3rd │ 0 │ 0 │ 1 │ 3 │ 0 │ 18 │ 22 │ │ 2nd │ 0 │ 0 │ 0 │ 5 │ 0 │ 17 │ 22 │ │ 1st │ 0 │ 0 │ 0 │ 10 │ 0 │ 12 │ 22 │ └───────────────┴──────┴──────┴──────────┴──────────┴──────┴──────┴───────┘ [9 rows x 8 columns]
Example:
import sqldatamodel as sdm # For example, setup a source connection source_db_conn = pyodbc.connect(...) # A destination connection destination_db_conn = sqlite3.connect(...) # Grab your source table df = sdm.from_sql("select * from source_table", source_db_conn) # Modify it however you want, whether through plain SQL df = df.execute_fetch('select "whatever", "i", "want" from "wherever_i_want" where "what_i_need" is not null ') # Or through any number of built-in methods like filtering df = df[df['create_date'] >= '2023-01-01'] # Or creating new columns df['new_date'] = datetime.now() # Or modifying existing ones df['salary'] = df['salary'] * 2 # Or applying functions df['user_id'] = df['user_id'].apply(lambda x: x**2) # Or deduplicating df = df.deduplicate(subset=['user_id','user_name']) # Or iterate through it row-by-row and modify it for idx, row in df.iter_tuples(index=True): if row['number'] % 2 == 0: row[idx,'odd_even'] = 'even' else: row[idx,'odd_even'] = 'odd' # Or join it using any of the standard join operations df = df_left.merge(df_right, how='left', left_on='id', right_on='id') # Or group or aggregate the data: df_agg = df.group_by(["first", "last", "position"]) # Or have your data imported and described for you df = sdm.from_parquet('titanic.parquet').describe() # View result print(df)
This will output:
┌────────┬─────────────┬──────────┬────────┬────────┬───────┬────────┐ │ metric │ passengerid │ survived │ pclass │ sex │ age │ fare │ ├────────┼─────────────┼──────────┼────────┼────────┼───────┼────────┤ │ count │ 891 │ 891 │ 891 │ 891 │ 714 │ 891 │ │ unique │ 891 │ 2 │ 3 │ 2 │ 88 │ 248 │ │ top │ 891 │ 0 │ 3 │ male │ 24 │ 8.05 │ │ freq │ 1 │ 549 │ 491 │ 577 │ 30 │ 43 │ │ mean │ 446 │ 0 │ 2 │ NaN │ 29.7 │ 32.2 │ │ std │ 257 │ 0 │ 0 │ NaN │ 14.53 │ 49.69 │ │ min │ 1 │ 0 │ 1 │ female │ 0.42 │ 0 │ │ p25 │ 223 │ 0 │ 2 │ NaN │ 6 │ 7.9 │ │ p50 │ 446 │ 0 │ 3 │ NaN │ 24 │ 14.45 │ │ p75 │ 669 │ 1 │ 3 │ NaN │ 35 │ 31 │ │ max │ 891 │ 1 │ 3 │ male │ 80 │ 512.33 │ │ dtype │ int │ int │ int │ str │ float │ float │ └────────┴─────────────┴──────────┴────────┴────────┴───────┴────────┘ [12 rows x 7 columns]
Move data quickly from one source or format to another:
# Load it to your destination database: df.to_sql("new_table", destination_db_conn) # Or any number of formats including: df.to_csv("output.csv") df.to_html("output.html") df.to_json("output.json") df.to_latex("output.tex") df.to_markdown("output.md") df.to_parquet("output.parquet") df.to_pickle("output.sdm") df.to_text("output.txt") df.to_xml("output.xml") df.to_local_db("output.db") # Reload it back again from more formats: df = sdm.from_csv("output.csv") df = sdm.from_dict(py_dict) df = sdm.from_html("output.html") df = sdm.from_json("output.json") df = sdm.from_latex("output.tex") df = sdm.from_markdown("output.md") df = sdm.from_numpy(np_arr) df = sdm.from_pandas(pd_df) df = sdm.from_polars(pl_df) df = sdm.from_parquet("output.parquet") df = sdm.from_pickle("output.sdm") df = sdm.from_sql("output", sqlite3.connect('output.db')) df = sdm.from_xml("output.xml")
Data Formats
SQLDataModelseamlessly interacts with a wide range of data formats providing a versatile platform for data extraction, conversion, and writing. Supported formats include:Arrow: Convert to and from Apache Arrow format,pyarrowrequired.CSV: Extract from and write to comma separated value,.csv, files.Excel: Extract from and write to Excel.xlsxfiles,openpyxlrequired.HTML: Extract from web and write to and from.htmlfiles including formatted string literals.JSON: Extract from and write to.jsonfiles, JSON-like objects, or JSON formatted sring literals.LaTeX: Extract from and write to.texfiles, LaTeX formatted string literals.Markdown: Extract from and write to.MDfiles, Markdown formatted string literals.Numpy: Convert to and fromnumpy.ndarrayobjects,numpyrequired.Pandas: Convert to and frompandas.DataFrameobjects,pandasrequired.Parquet: Extract from and write to.parquetfiles,pyarrowrequired.Pickle: Extract from and write to.pklfiles, package uses.sdmextension when pickling forSQLDataModelmetadata.Polars: Convert to and frompolars.DataFrameobjects,polarsrequired.SQL: Extract from and write to the following popular SQL databases:SQLite: Using the built-insqlite3module.PostgreSQL: Using thepsycopg2package.SQL Server: Using thepyodbcpackage.Oracle: Using thecx_Oraclepackage.Teradata: Using theteradatasqlpackage.
Text: Write to and from.txtfiles including otherSQLDataModelstring representations.TSV or delimited: Write to and from files delimited by:\t: Tab separated values or.tsvfiles.\s: Single space or whitespace separated values.;: Semicolon separated values.|: Pipe separated values.:: Colon separated values.,: Comma separated values or.csvfiles.
XML: Extract from xml formats and write to and from.xmlfiles including XML formatted string literals.Python objects:dictionaries: Convert to and from collections of pythondictobjects.lists: Convert to and from collections of pythonlistobjects.tuples: Convert to and from collections of pythontupleobjects.namedtuples: Convert to and from collections ofnamedtuplesobjects.
Pretty Printing
SQLDataModel also pretty prints your table in any color you specify, use
SQLDataModel.set_display_color()and provide either a hex value or a tuple of rgb and print the table, example output:┌───┬─────────────────────┬────────────┬─────────────┬────────┬─────────┐ │ │ full_name │ date │ country │ pin │ service │ ├───┼─────────────────────┼────────────┼─────────────┼────────┼─────────┤ │ 0 │ Pamela Berg │ 2024-09-15 │ New Zealand │ 3010 │ 3.02 │ │ 1 │ Mason Hoover │ 2024-01-23 │ Australia │ 6816 │ 5.01 │ │ 2 │ Veda Suarez │ 2023-09-04 │ Ukraine │ 1175 │ 4.65 │ │ 3 │ Guinevere Cleveland │ 2024-03-22 │ New Zealand │ 4962 │ 3.81 │ │ 4 │ Vincent Mccoy │ 2023-09-16 │ France │ 4446 │ 2.95 │ │ 5 │ Holmes Kemp │ 2024-11-13 │ Germany │ 9396 │ 4.61 │ │ 6 │ Donna Mays │ 2023-06-06 │ Costa Rica │ 8153 │ 5.34 │ │ 7 │ Rama Galloway │ 2023-09-22 │ Italy │ 3384 │ 3.87 │ │ 8 │ Lucas Rodriquez │ 2024-03-16 │ New Zealand │ 3278 │ 2.73 │ │ 9 │ Hunter Donaldson │ 2023-06-30 │ Belgium │ 1593 │ 4.58 │ └───┴─────────────────────┴────────────┴─────────────┴────────┴─────────┘
Note
No additional dependencies are installed with this package, however you will obviously need to have pandas or numpy to create pandas or numpy objects.
Use
SQLDataModel.set_display_color()to modify the terminal color of the table, by default no color styling is applied.Use
SQLDataModel.get_supported_sql_connections()to view supported SQL connection packages, please reach out with any issues or questions, thanks!
- __add__(value: str | int | float | SQLDataModel) SQLDataModel[source]
Implements the
+operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.
- Raises:
TypeError – If the provided
valueis not a valid type (str, int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as addition) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the addition operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar addition df['x + 100'] = df['x'] + 100 # Perform vector addition using another column df['x + y'] = df['x'] + df['y'] # View both results print(df)
This will output:
┌─────┬─────┬─────────┬───────┐ │ x │ y │ x + 100 │ x + y │ ├─────┼─────┼─────────┼───────┤ │ 2 │ 10 │ 102 │ 12 │ │ 4 │ 20 │ 104 │ 24 │ │ 8 │ 30 │ 108 │ 38 │ │ 16 │ 40 │ 116 │ 56 │ │ 32 │ 50 │ 132 │ 82 │ └─────┴─────┴─────────┴───────┘ [5 rows x 4 columns]
We can also use addition to concatenate strings:
import sqldatamodel as sdm # Sample data headers = ['First', 'Last'] data = [['Alice', 'Smith'],['Bob', 'Johnson'],['Charlie', 'Hall'],['David', 'Brown']] # Create the model df = sdm.SQLDataModel(data, headers) # Concatenate scalar character df['Loud First'] = df['First'] + '!' # Concatenate scalar and vector using existing columns df['Full Name'] = df['First'] + ' ' + df['Last'] # View it print(df)
This will output:
┌─────────┬─────────┬────────────┬──────────────┐ │ First │ Last │ Loud First │ Full Name │ ├─────────┼─────────┼────────────┼──────────────┤ │ Alice │ Smith │ Alice! │ Alice Smith │ │ Bob │ Johnson │ Bob! │ Bob Johnson │ │ Charlie │ Hall │ Charlie! │ Charlie Hall │ │ David │ Brown │ David! │ David Brown │ └─────────┴─────────┴────────────┴──────────────┘ [4 rows x 4 columns]
Note
Mixing summands such as
int + floatwill work, however an exception will be raised when attempting to perform addition on incompatible types such asstr + float.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __and__(other: SQLDataModel) set[int][source]
Implements the bitwise AND operator
&for combining the result sets ofselfandother.- Parameters:
other – The
SQLDataModelto combine with.- Returns:
A set of indices representing the intersection of the result rows from both
SQLDataModelinstances.- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the sample model df = sdm.SQLDataModel(data, headers) # Apply some filtering conditions to both models filter_1 = df[df['Age'] <= 40] filter_2 = df[df['Service'] > 2] # Perform a bitwise AND operation to return a new model result = df[filter_1 & filter_2] # View result print(result)
This will output the result of filtering by ‘Age’ and ‘Service’:
┌───┬───────┬────────┬─────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼────────┼─────┼─────────┼────────────┼────────┤ │ 1 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ └───┴───────┴────────┴─────┴─────────┴────────────┴────────┘ [2 rows x 6 columns]
Note
If
otheris not an instance ofSQLDataModel, aNotImplementedErroris raised to be consistent with current conventions.See
SQLDataModel.__or__()for bitwise OR operation.
- Changelog:
- Version 0.7.4 (2024-06-13):
New method.
- __bool__() bool[source]
Implements logical boolean operator for
SQLDataModelusing the current row count.- Returns:
True if
SQLDataModel.row_count!= 0, False otherwise.- Return type:
bool
Example:
import sqldatamodel as sdm # Create an empty model df = sdm.SQLDataModel(headers=['Stage', 'Match', 'Result']) # Use boolean method to avoid duplicating result if not df: df[0] = ['Group', 1, 'Scotland Win'] else: print('Match result already stored')
Note
This method is equivalent to
df.row_count != 0See
SQLDataModel.__eq__()and related comparison methods for more details.
- Changelog:
- Version 0.7.1 (2024-06-09):
New method.
- __eq__(other) set[int][source]
Implements the is equal to operator
==for comparingSQLDataModelagainstotherand performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModelor scalar (int,str,float) to compare with.- Returns:
The set of row indicies resulting from the operation that satisfy the condition.
- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter by 'Gender' column df = df[df['Gender'] == 'Female'] # View result print(df)
This will output:
┌───┬───────┬──────┬──────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼──────┼──────┼─────────┼────────────┼────────┤ │ 0 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 1 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ └───┴───────┴──────┴──────┴─────────┴────────────┴────────┘ [2 rows x 6 columns]
Note
For scalar
other(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue.For SQLDataModel
other, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_countandcolumn_countand returns those row indicies evaluating toTrue.All the equality operations return a python
setobject containing the row indicies which were returned from the evaluation.All operations on standard types like
int,floatorstrfollow standard behavior and are not modified by performing the operations.Operations can be chained using standard
setoperators like&and|to allow complex filtering, multiple operations require parenthesis.
- __floordiv__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
//operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.ZeroDivisionError – If
valueis 0.
- Returns:
A new SQLDataModel resulting from the floor division operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar floor division df['y // 10'] = df['y'] // 10 # Perform vector floor division using another column df['y // x'] = df['y'] // df['x'] # View both results print(df)
This will output:
┌─────┬─────┬─────────┬────────┐ │ x │ y │ y // 10 │ y // x │ ├─────┼─────┼─────────┼────────┤ │ 2 │ 10 │ 1 │ 5 │ │ 4 │ 20 │ 2 │ 5 │ │ 8 │ 30 │ 3 │ 3 │ │ 16 │ 40 │ 4 │ 2 │ │ 32 │ 50 │ 5 │ 1 │ └─────┴─────┴─────────┴────────┘ [5 rows x 4 columns]
Note
Mixing divisor types such as
int // floatwill work, however an exception will be raised when attempting to perform division on incompatible types such asstr // float.
- Changelog:
- Version 0.2.2 (2024-03-26):
New method.
- __ge__(other) set[int][source]
Implements the greater than or equal to operator
>=for comparingSQLDataModelagainstotherand performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModelor scalar (int,str,float) to compare with.- Returns:
The set of row indicies resulting from the operation that satisfy the condition.
- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter by 'Hired' column df = df[df['Hired'] >= datetime.date(2020,1,1)] # View result print(df)
This will output:
┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 1 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ │ 2 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ └───┴───────┴────────┴──────┴─────────┴────────────┴────────┘ [3 rows x 6 columns]
Note
For scalar
other(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue.For SQLDataModel
other, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_countandcolumn_countand returns those row indicies evaluating toTrue.All the equality operations return a python
setobject containing the row indicies which result from the evaluation.All operations on standard types like
int,floatorstrfollow standard behavior and are not modified by performing the operations.Operations can be chained using standard
setoperators like&and|to allow complex filtering, multiple operations require parenthesis.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __getitem__(target_indicies) SQLDataModel[source]
Retrieves a subset of the SQLDataModel based on the specified indices.
- Parameters:
slc – Indices specifying the rows and columns to be retrieved. This can be an integer, a tuple, a slice, or a combination of these.
- Raises:
ValueError – if there are issues with the specified indices, such as invalid row or column names.
TypeError – if the
slctype is not compatible with indexing SQLDataModel.IndexError – if the
slcincludes a range or int that is outside of the current row count or column count.
- Returns:
An instance of SQLDataModel containing the selected subset of data.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create a sample model headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the sample model df = sdm.SQLDataModel(data, headers) # Retrieve a specific row by index subset_model = df[3] # Retrieve multiple rows and specific columns using a tuple subset_model = df[(1, 2, 4), ["First", "Service", "Age"]] # Retrieve a range of rows and all columns using a slice subset_model = df[1:4] # Retrieve a single column by name subset_model = df["First"]
- Changelog:
- Version 0.5.0 (2024-05-09):
Modified index retention behavior to pass through row indicies and avoid resetting view order.
Note
The
slcparameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.The returned SQLDataModel instance will contain the specified subset of rows and columns, retaining the row indicies of the original view.
- Changelog:
- Version 0.5.0 (2024-05-09):
Modified index retention behavior to pass through row indicies and avoid resetting view order.
- __gt__(other) set[int][source]
Implements the greater than operator
>for comparingSQLDataModelagainstotherand performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModelor scalar (int,str,float) to compare with.- Returns:
The set of row indicies resulting from the operation that satisfy the condition.
- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter by 'Service' column df = df[df['Service'] > 5.0] # View result print(df)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 1 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ Male │ └───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘ [2 rows x 6 columns]
Note
For scalar
other(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue.For SQLDataModel
other, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_countandcolumn_countand returns those row indicies evaluating toTrue.All the equality operations return a python
setobject containing the row indicies which were returned from the evaluation.All operations on standard types like
int,floatorstrfollow standard behavior and are not modified by performing the operations.Operations can be chained using standard
setoperators like&and|to allow complex filtering, multiple operations require parenthesis.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __iadd__(value: str | int | float | SQLDataModel) SQLDataModel[source]
Implements the
+=operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.
- Raises:
TypeError – If the provided
valueis not a valid type (str, int, float, or SQLDataModel).- Returns:
The modified SQLDataModel after the addition operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['idx', 'first', 'last', 'age', 'service'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model df = sdm.SQLDataModel(data, headers) # Modifying first name column with a bang! df['first'] += '!' # View model print(df)
This will output:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john! │ smith │ 27 │ 1.22 │ │ 1 │ sarah! │ west │ 39 │ 0.70 │ │ 2 │ mike! │ harlin │ 36 │ 3.00 │ │ 3 │ pat! │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __idiv__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
/=operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).ZeroDivisionError – If
valueof divisor is 0.
- Returns:
The modified SQLDataModel after the division operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Budget']) # Adjust existing column df['Budget'] /= 52
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __ifloordiv__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
//=operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int or float).ZeroDivisionError – If
valueis 0.
- Returns:
A new SQLDataModel resulting from the floor division operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x'] data = [[10],[20],[30],[40],[50]] # Create the model df = sdm.SQLDataModel(data, headers) # Modify the existing column df['x'] //= 3 # View result print(df)
This will output:
┌───┬──────┐ │ │ x │ ├───┼──────┤ │ 0 │ 3 │ │ 1 │ 6 │ │ 2 │ 10 │ │ 3 │ 13 │ │ 4 │ 16 │ └───┴──────┘ [5 rows x 1 columns]
- Changelog:
- Version 0.2.2 (2024-03-26):
New method.
- __imul__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
*=operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int or float).- Returns:
The modified SQLDataModel after the multiplication operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Salary']) # Give raises to all! df['Salary'] *= 12
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __init__(data: list[list] = None, headers: list[str] = None, dtypes: dict[str, str] = None, display_max_rows: int = None, min_column_width: int = 3, max_column_width: int = 38, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic', display_color: str = None, display_index: bool = True, display_float_precision: int = 2, infer_types: bool = False, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = 'default')[source]
Initializes a new instance of
SQLDataModel.- Parameters:
data (list[list]) – The data to populate the model. Should be a list of lists or a list of tuples or a dictionary orientated by rows or columns.
headers (list[str], optional) – The column headers for the model. If not provided, default headers will be used.
dtypes (dict, optional) – A dictionary specifying the data types for each column. Format: {‘column’: ‘dtype’}.
display_max_rows (int, optional) – The maximum number of rows to display. Default is None, using terminal height to format number of rows.
min_column_width (int, optional) – The minimum width for each column. Default is 3.
max_column_width (int, optional) – The maximum width for each column. Default is 38.
column_alignment (str, optional) – The alignment for columns, must be ‘dynamic’, ‘left’, ‘center’ or ‘right’). Default is ‘dynamic’.
display_color (str|tuple, optional) – The color for display as hex code string or rgb tuple.
display_index (bool, optional) – Whether to display row indices. Default is True.
display_float_precision (int, optional) – The number of decimal places to display for float values. Default is 2.
infer_types (bool, optional) – Whether to infer the data types based on a randomly selected sample. Default is False, using first row to derive the corresponding type directly.
table_style (str, optional) – The styling to use when representing the table in textual formats. Must be ‘ascii’, ‘bare’, ‘dash’, ‘default’, ‘double’, ‘list’, ‘markdown’, ‘outline’, ‘pandas’, ‘polars’, ‘postgresql’, ‘rst-grid’, ‘rst-simple’ or ‘round’.
- Raises:
ValueError – If
dataandheadersare not provided, or ifdatais of insufficient length.TypeError – If
dataorheadersis not a valid type (list or tuple), or ifdtypesis not a dictionary.DimensionError – If the length of
headersdoes not match the implied column count from the data.SQLProgrammingError – If there’s an issue with executing SQL statements during initialization.
Example:
import sqldatamodel as sdm # Create sample data data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')] # Create the model with custom headers df = sdm.SQLDataModel(data, headers=['Name','Age','Sex']) # Display the model print(df)
This will output the SQLDataModel formatted to fit within the current terminal:
┌────────┬──────┬──────┐ │ Name │ Age │ Sex │ ├────────┼──────┼──────┤ │ Alice │ 20 │ F │ │ Bob │ 25 │ M │ │ Gerald │ 30 │ M │ └────────┴──────┴──────┘ [3 rows x 3 columns]
A
SQLDataModelcan be initialized from dozens of data formats, including python dictionaries:import sqldatamodel as sdm # Dictionary with sample data data = { 'Name': ['Ali', 'Bob', 'Chris'], 'Role': ['Judge', 'Pilot', 'Nurse'], 'Height': [174.2, 180.9, 173.4], } # Create the model and set a new style df = sdm.SQLDataModel(data, table_style='list') # View it print(df)
This will output the SQLDataModel using the ‘list’ styling:
Name Role Height ----- ----- ------- Ali Judge 174.20 Bob Pilot 180.90 Chris Nurse 173.40
Note
If
datais not provided, an empty model is created with headers, at least one ofdata,headersordtypesare required to instantiate the model.If
headersare not provided, default headers will be generated using the the format'0', '1', ..., NwhereNis the column count.If
dtypesis provided, it must be a dictionary with column names as keys and Python data types as string values, e.g., {‘first_name’: ‘str’, ‘weight’: ‘float’}If
infer_types = Trueanddtypesare provided, the order will be resolved by first inferring the types, then overriding the inferred types for each{col:type}provided in thedtypesargument. If one is not provided, then the inferred type will be used as a fallback.For creating
SQLDataModelfrom file formats like CSV, Markdown, LaTeX, Excel, Parquet or Text files, seeSQLDataModel.from_data()or go to format specific constructor.For creating
SQLDataModelfrom object formats like Pyarrow, JSON, HTML, Pandas, Numpy or Polars, see format specific constructor likeSQLDataModel.from_pandas()orSQLDataModel.from_numpy().Use
SQLDataModel.set_table_style()to change the format and styling used when displaying the model.Use
SQLDataModel.set_display_index()to toggle inclusion of index column in table representations.Use
SQLDataModel.set_display_color()to modify the terminal color used to style the model.Use
SQLDataModel.set_display_max_rows()to modify the number of rows output in the representations.
- Changelog:
- Version 2.3.0 (2026-01-21):
Modified to handle decimal.Decimal type by lossy conversion to python’s float type
- Version 0.12.0 (2024-07-06):
Modified the default minimum number of displayed rows from 1 to 4 when
SQLDataModel.display_max_rowsis None.
- Version 0.11.0 (2024-07-05):
Added additional option ‘latex’ for
table_styleparameter.
- Version 0.9.3 (2024-06-28):
Added additional options ‘rst-simple’ and ‘rst-grid’ for
table_styleparameter.
- __ipow__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
**=operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to raise each element in the SQLDataModel to.
- Raises:
TypeError – If the provided
valueis not a valid type (int or float).- Returns:
The modified SQLDataModel after the exponential operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Salary']) # More raises! df['Salary'] **= 2
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __isub__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
-=operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float, or SQLDataModel).- Returns:
The modified SQLDataModel after the subtraction operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age', 'service'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model df = sdm.SQLDataModel(data, headers) # Modifying age column in the best direction df['age'] -= 10 # View model print(df)
This will output:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 17 │ 1.22 │ │ 1 │ sarah │ west │ 29 │ 0.70 │ │ 2 │ mike │ harlin │ 26 │ 3.00 │ │ 3 │ pat │ douglas │ 32 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __iter__() Iterator[tuple][source]
Returns an iterator over the current range of rows in the
SQLDataModelstarting from the first row.- Raises:
StopIteration – When there are no more rows to return.
- Yields:
tuple– Next row fetched from the currentSQLDataModel.
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # Iterate through rows for row in df: print(row)
This will output:
(0, 'John', 30, 175.3) (1, 'Alice', 28, 162.0) (2, 'Travis', 35, 185.8)
Note
This iterator fetches rows from the
SQLDataModelusing a SQL statement generated by theSQLDataModel._generate_sql_stmt()method.The iteration starts from the first row, index 0, and continues until
SQLDataModel.row_countis reached.See
SQLDataModel.iter_rows()for iterating over rows with custom start and stop indicies.See
SQLDataModel.iter_tuples()for iterating over rows as named tuples.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __le__(other) set[int][source]
Implements the less than or equal to operator
<=for comparingSQLDataModelagainstotherand performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModelor scalar (int,str,float) to compare with.- Returns:
The set of row indicies resulting from the operation that satisfy the condition.
- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter by 'Age' column df = df[df['Age'] <= 40] # View result print(df)
This will output:
┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 1 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ └───┴───────┴────────┴──────┴─────────┴────────────┴────────┘ [3 rows x 6 columns]
Note
For scalar
other(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue.For SQLDataModel
other, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_countandcolumn_countand returns those row indicies evaluating toTrue.All the equality operations return a python
setobject containing the row indicies which were returned from the evaluation.All operations on standard types like
int,floatorstrfollow standard behavior and are not modified by performing the operations.Operations can be chained using standard
setoperators like&and|to allow complex filtering, multiple operations require parenthesis.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __len__() int[source]
Returns the
SQLDataModel.row_countproperty for the currentSQLDataModelwhich represents the current number of rows in the model.- Returns:
The total number of rows in the SQLDataModel.
- Return type:
int
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get current length num_rows = len(df) # View number print(num_rows)
This will output:
1000- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __lt__(other) set[int][source]
Implements the less than operator
<for comparingSQLDataModelagainstotherand performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModelor scalar (int,str,float) to compare with.- Returns:
The set of row indicies resulting from the operation that satisfy the condition.
- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter by 'Age' column df = df[df['Age'] < 40] # View result print(df)
This will output:
┌───┬───────┬────────┬──────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 1 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ └───┴───────┴────────┴──────┴─────────┴────────────┴────────┘ [3 rows x 6 columns]
Note
For scalar
other(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue.For SQLDataModel
other, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_countandcolumn_countand returns those row indicies evaluating toTrue.All the equality operations return a python
setobject containing the row indicies which were returned from the evaluation.All operations on standard types like
int,floatorstrfollow standard behavior and are not modified by performing the operations.Operations can be chained using standard
setoperators like&and|to allow complex filtering, multiple operations require parenthesis.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __mul__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
*operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as multiplication) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the multiplication operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdmSQLDataModel(data, headers) # Perform scalar multiplication df['x * 10'] = df['x'] * 10 # Perform vector multiplication using another column df['x * y'] = df['x'] * df['y'] # View results print(df)
This will output:
┌─────┬─────┬────────┬───────┐ │ x │ y │ x * 10 │ x * y │ ├─────┼─────┼────────┼───────┤ │ 2 │ 10 │ 20 │ 20 │ │ 4 │ 20 │ 40 │ 80 │ │ 8 │ 30 │ 80 │ 240 │ │ 16 │ 40 │ 160 │ 640 │ │ 32 │ 50 │ 320 │ 1600 │ └─────┴─────┴────────┴───────┘ [5 rows x 4 columns]
Note
Mixing multipliers such as
int * floatwill work, however an exception will be raised when attempting to perform multiplication on incompatible types such asstr * float.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __ne__(other) set[int][source]
Implements the not equal to operator
!=for comparingSQLDataModelagainstotherand performing the equivalent set operation against the model’s current indicies.- Parameters:
other – The
SQLDataModelor scalar (int,str,float) to compare with.- Returns:
The set of row indicies resulting from the operation that satisfy the condition.
- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter by 'First' column df = df[df['First'] != 'John'] # View result print(df)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼─────────┼──────┼─────────┼────────────┼────────┤ │ 0 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ Female │ │ 1 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ │ 2 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ │ 3 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ Male │ └───┴───────┴─────────┴──────┴─────────┴────────────┴────────┘ [4 rows x 6 columns]
Note
For scalar
other(int, str, or float), compares each element with the scalar and returns the row indicies evaluating toTrue.For SQLDataModel
other, compares each element across X rows for Y columns for all (X_i, Y_j) in range ofrow_countandcolumn_countand returns those row indicies evaluating toTrue.All the equality operations return a python
setobject containing the row indicies which were returned from the evaluation.All operations on standard types like
int,floatorstrfollow standard behavior and are not modified by performing the operations.Operations can be chained using standard
setoperators like&and|to allow complex filtering, multiple operations require parenthesis.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __or__(other: SQLDataModel) set[int][source]
Implements the bitwise OR operator
|for combining the result sets ofselfandother.- Parameters:
other – The
SQLDataModelto combine with.- Returns:
A set of indices representing the union of the result rows from both
SQLDataModelinstances.- Return type:
set[int]
Example:
import sqldatamodel as sdm headers = ['First', 'Last', 'Age', 'Service', 'Hired', 'Gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Sarah', 'West', 51, 0.7, '2023-10-01', 'Female'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ] # Create the sample model df = sdm.SQLDataModel(data, headers) # Apply some filtering conditions to both models filter_1 = df[df['Age'] > 40] filter_2 = df[df['Gender'] == 'Male'] # Perform a bitwise OR operation to return a new model result = df[filter_1 | filter_2] # View result print(result)
This will output the result of filtering by ‘Age’ or ‘Gender’:
┌───┬───────┬─────────┬─────┬─────────┬────────────┬────────┐ │ │ First │ Last │ Age │ Service │ Hired │ Gender │ ├───┼───────┼─────────┼─────┼─────────┼────────────┼────────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ Male │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ Male │ │ 3 │ Sarah │ West │ 51 │ 0.70 │ 2023-10-01 │ Female │ │ 4 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ Male │ └───┴───────┴─────────┴─────┴─────────┴────────────┴────────┘ [4 rows x 6 columns]
Note
If
otheris not an instance ofSQLDataModel, aNotImplementedErroris raised to be consistent with current conventions.See
SQLDataModel.__and__()for bitwise AND operation.
- Changelog:
- Version 0.7.4 (2024-06-13):
New method.
- __pow__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
**operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The exponent value to raise each element in the SQLDataModel to.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as exponentiation) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the exponential operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,1], [4,2], [8,3], [16,4], [32,5]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar exponentiation df['y ** 2'] = df['y'] ** 2 # Perform vector exponentiation using another column df['x ** y'] = df['x'] ** df['y'] # View results print(df)
This will output:
┌─────┬─────┬────────┬──────────┐ │ x │ y │ y ** 2 │ x ** y │ ├─────┼─────┼────────┼──────────┤ │ 2 │ 1 │ 1 │ 2 │ │ 4 │ 2 │ 4 │ 16 │ │ 8 │ 3 │ 9 │ 512 │ │ 16 │ 4 │ 16 │ 65536 │ │ 32 │ 5 │ 25 │ 33554432 │ └─────┴─────┴────────┴──────────┘ [5 rows x 4 columns]
Note
Mixing exponent types such as
int ** floatwill work, however an exception will be raised when attempting to exponentiate incompatible types such asstr ** float.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __radd__(value: str | int | float | SQLDataModel) SQLDataModel[source]
Implements the right side operand for
+operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (str | int | float | SQLDataModel) – The value to be added to each element in the SQLDataModel.
- Raises:
TypeError – If the provided
valueis not a valid type (str, int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as addition) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the addition operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar addition df['100 + x'] = 100 + df['x'] # Perform vector addition using another column df['y + x'] = df['y'] + df['x'] # View both results print(df)
This will output:
┌─────┬─────┬─────────┬───────┐ │ x │ y │ 100 + x │ y + x │ ├─────┼─────┼─────────┼───────┤ │ 2 │ 10 │ 102 │ 12 │ │ 4 │ 20 │ 104 │ 24 │ │ 8 │ 30 │ 108 │ 38 │ │ 16 │ 40 │ 116 │ 56 │ │ 32 │ 50 │ 132 │ 82 │ └─────┴─────┴─────────┴───────┘ [5 rows x 4 columns]
We can also use addition to concatenate strings:
import sqldatamodel as sdm # Sample data headers = ['First', 'Last'] data = [['Alice', 'Smith'],['Bob', 'Johnson'],['Charlie', 'Hall'],['David', 'Brown']] # Create the model df = sdm.SQLDataModel(data, headers) # Concatenate scalar character df['Prefixed First'] = 'Name: ' + df['First'] # Concatenate scalar and vector using existing columns df['Full Name'] = df['First'] + ' ' + df['Last'] # View it print(df)
This will output:
┌─────────┬─────────┬────────────────┬──────────────┐ │ First │ Last │ Prefixed First │ Full Name │ ├─────────┼─────────┼────────────────┼──────────────┤ │ Alice │ Smith │ Name: Alice │ Alice Smith │ │ Bob │ Johnson │ Name: Bob │ Bob Johnson │ │ Charlie │ Hall │ Name: Charlie │ Charlie Hall │ │ David │ Brown │ Name: David │ David Brown │ └─────────┴─────────┴────────────────┴──────────────┘ [4 rows x 4 columns]
Note
Mixing summands such as
int + floatwill work, however an exception will be raised when attempting to perform addition on incompatible types such asstr + float.See
SQLDataModel.__add__()for left side operand addition orSQLDataModel.__iadd__()for in-place addition.
- Changelog:
- Version 0.7.3 (2024-06-12):
New method.
- __repr__() str[source]
Returns a pretty printed string representation of
SQLDataModelformatted to the current terminal size.- Returns:
The string representation of the SQLDataModel instance output using display and format values set on instance.
- Return type:
str
Example:
import sqldatamodel as sdm # Sample data headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the model df = sdm.SQLDataModel(data,headers) # Display the string representation print(df)
This will output the default alignment, dynamically aligning columns based on their dtype, right-aligned for numeric, left otherwise:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Using
'left'column alignment:# Using left alignment instead df.set_column_alignment("left") # See difference print(df)
This will output:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Using
'center'column alignment:# Using center alignment instead df.set_column_alignment("center") # See difference print(df)
This will output:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Using
'right'column alignment:# Using right alignment instead df.set_column_alignment("right") # See difference print(df)
This will output:
┌───┬────────┬─────────┬────────┐ │ │ first │ last │ age │ ├───┼────────┼─────────┼────────┤ │ 0 │ john │ smith │ 27 │ │ 1 │ sarah │ west │ 29 │ │ 2 │ mike │ harlin │ 36 │ │ 3 │ pat │ douglas │ 42 │ └───┴────────┴─────────┴────────┘ [4 rows x 3 columns]
Note
Use
SQLDataModel.set_display_max_rows()to explicitly set vertical height and modify vertical truncation behavior, which uses current terminal height by default.Use
SQLDataModel.set_min_column_width()andSQLDataModel.set_max_column_width()to adjust column widths and modify horizontal truncation behavior.Use
SQLDataModel.set_column_alignment()to modify column alignment, available options are dynamic alignment based on dtype, left, center or right alignment.Use
SQLDataModel.set_display_color()to modify the table color, by default no color is applied with characters drawn using platform specific settings.Use
SQLDataModel.set_table_style()to modify the table style format and box characters used to draw the table.
- Changelog:
- Version 0.12.0 (2024-07-06):
Changed default behavior to display a minimum of 4 rows when
display_max_rows = Noneto retain data visibility when terminal size is below threshold.
- Version 0.10.4 (2024-07-03):
Modified to escape newline characters through
utils.sqlite_printf_format()to avoid wrapping table rows.
- Version 0.7.0 (2024-06-08):
Modified horizontal truncation behavior to alternate column selection between table start and table end instead of sequential left to right ordering.
- __rfloordiv__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the right side operand
//operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.ZeroDivisionError – If
valueis 0.
- Returns:
A new SQLDataModel resulting from the floor division operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,8], [4,16], [8,32], [32,64], [32,128]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar floor division df['128 // y'] = 128 // df['y'] # Perform vector floor division using another column df['y // x'] = df['y'] // df['x'] # View both results print(df)
This will output:
┌─────┬─────┬──────────┬────────┐ │ x │ y │ 128 // y │ y // x │ ├─────┼─────┼──────────┼────────┤ │ 2 │ 8 │ 16 │ 4 │ │ 4 │ 16 │ 8 │ 4 │ │ 8 │ 32 │ 4 │ 4 │ │ 32 │ 64 │ 2 │ 2 │ │ 32 │ 128 │ 1 │ 4 │ └─────┴─────┴──────────┴────────┘ [5 rows x 4 columns]
Note
Mixing divisor types such as
int // floatwill work, however an exception will be raised when attempting to perform division on incompatible types such asstr // float.See
SQLDataModel.__floordiv__()for standard left side operand implementation of floor division operations.
- Changelog:
- Version 0.7.7 (2024-06-17):
New method.
- __rmul__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the right side operand for
*operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to multiply each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as multiplication) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the multiplication operation.
- Return type:
SQLDataModel
Note
See
SQLDataModel.__mul__()for additional details and usage examples.This function simply wraps the primary multiplication method after swapping the order of the arguments.
- Changelog:
- Version 0.7.3 (2024-06-12):
New method.
- __rpow__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the right side operand
**operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The exponent value to raise each element in the SQLDataModel to.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as exponentiation) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the exponential operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,1], [4,2], [6,3], [8,4], [10,5]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar exponentiation df['2 ** y'] = 2 ** df['y'] # Perform vector exponentiation using another column df['y ** x'] = df['y'] ** df['x'] # View results print(df)
This will output:
┌─────┬─────┬────────┬─────────┐ │ x │ y │ 2 ** y │ y ** x │ ├─────┼─────┼────────┼─────────┤ │ 2 │ 1 │ 2 │ 1 │ │ 4 │ 2 │ 4 │ 16 │ │ 6 │ 3 │ 8 │ 729 │ │ 8 │ 4 │ 16 │ 65536 │ │ 10 │ 5 │ 32 │ 9765625 │ └─────┴─────┴────────┴─────────┘ [5 rows x 4 columns]
Note
Mixing exponent types such as
int ** floatwill work, however an exception will be raised when attempting to exponentiate incompatible types such asstr ** float.See
SQLDataModel.__pow__()for standard left side operand implementation of exponential operations.
- Changelog:
- Version 0.7.7 (2024-06-17):
New method.
- __rsub__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the right side operand for
-operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.
- Raises:
TypeError – If the provided
valueis not a valid type (int or float).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as subtraction) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the subtraction operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar subtraction df['100 - x'] = 100 - df['x'] # Perform vector subtraction using another column df['y - x'] = df['y'] - df['x'] # View both results print(df)
This will output:
┌─────┬─────┬─────────┬───────┐ │ x │ y │ 100 - x │ y - x │ ├─────┼─────┼─────────┼───────┤ │ 2 │ 10 │ 98 │ 8 │ │ 4 │ 20 │ 96 │ 16 │ │ 8 │ 30 │ 92 │ 22 │ │ 16 │ 40 │ 84 │ 24 │ │ 32 │ 50 │ 68 │ 18 │ └─────┴─────┴─────────┴───────┘ [5 rows x 4 columns]
Note
Mixing subtractors such as
int + floatwill work, however an exception will be raised when attempting to perform subtraction on incompatible types such asstr - float.See
SQLDataModel.__sub__()for left side operand subtraction orSQLDataModel.__isub__()for in-place subtraction.
- Changelog:
- Version 0.7.3 (2024-06-12):
New method.
- __rtruediv__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the right side operand
/operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.ZeroDivisionError – If
valueis 0.
- Returns:
A new SQLDataModel resulting from the division operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar division df['10 / y'] = 10 / df['y'] # Perform vector division using another column df['x / y'] = df['x'] / df['y'] # View both results print(df)
This will output:
┌─────┬─────┬────────┬───────┐ │ x │ y │ 10 / y │ x / y │ ├─────┼─────┼────────┼───────┤ │ 2 │ 10 │ 1.00 │ 0.20 │ │ 4 │ 20 │ 0.50 │ 0.20 │ │ 8 │ 30 │ 0.33 │ 0.27 │ │ 16 │ 40 │ 0.25 │ 0.40 │ │ 32 │ 50 │ 0.20 │ 0.64 │ └─────┴─────┴────────┴───────┘ [5 rows x 4 columns]
Note
Mixing divisor types such as
int / floatwill work, however an exception will be raised when attempting to perform division on incompatible types such asstr / float.See
SQLDataModel.__truediv__()for left side operand division operations.
- Changelog:
- Version 0.7.3 (2024-06-12):
New method.
- __setitem__(target_indicies, update_values) None[source]
Updates specified rows and columns in the SQLDataModel with the provided values.
- Parameters:
target_indicies – Indices specifying the rows and columns to be updated. This can be an integer, a tuple, a slice, or a combination of these.
update_values – The values to be assigned to the corresponding model records. It can be of types: str, int, float, bool, bytes, list, tuple, or another SQLDataModel object.
- Raises:
TypeError – If the
update_valuestype is not compatible with SQL datatypes.DimensionError – If there is a shape mismatch between targeted indicies and provided update values.
ValueError – If there are issues with the specified indices, such as invalid row or column names.
- Returns:
None
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Job'] data = [ ('Billy', 30, 'Barber'), ('Alice', 28, 'Doctor'), ('John', 25, 'Technician'), ('Travis', 35, 'Musician'), ('William', 15, 'Student') ] # Create the model df = sdm.SQLDataModel(data, headers) # Update a specific row with new values df[2] = ("John", 25, "Engineer") # See result print(df)
This will output:
┌───┬─────────┬──────┬──────────┐ │ │ Name │ Age │ Job │ ├───┼─────────┼──────┼──────────┤ │ 0 │ Billy │ 30 │ Barber │ │ 1 │ Alice │ 28 │ Doctor │ │ 2 │ John │ 25 │ Engineer │ │ 3 │ Travis │ 35 │ Musician │ │ 4 │ William │ 15 │ Student │ └───┴─────────┴──────┴──────────┘ [5 rows x 3 columns]
Conditional updates can also be made using multiple columns:
import sqldatamodel as sdm headers = ['Employee', 'Base', 'Salary'] data = [ ('Alice', '58,500', '62,250'), ('Bobby', '60,750', None), ('Chloe', '58,500', '63,125'), ('David', '65,000', None), ('Ellie', '65,000', None), ('Fiona', '65,000', '71,450'), ] # Create sample model df = sdm.SQLDataModel(data, headers) # Selectively update values based on conditions df[df['Salary'].isna(), 'Salary'] = df['Base'] # View updates print(df)
This will output the resulting model where ‘Salary’ was updated with values from ‘Base’ only if missing:
┌───┬──────────┬────────┬────────┐ │ │ Employee │ Base │ Salary │ ├───┼──────────┼────────┼────────┤ │ 0 │ Alice │ 58,500 │ 62,250 │ │ 1 │ Bobby │ 60,750 │ 60,750 │ │ 2 │ Chloe │ 58,500 │ 63,125 │ │ 3 │ David │ 65,000 │ 65,000 │ │ 4 │ Ellie │ 65,000 │ 65,000 │ │ 5 │ Fiona │ 65,000 │ 71,450 │ └───┴──────────┴────────┴────────┘ [6 rows x 3 columns]
Values for multiple columns can also be set:
# Update multiple rows and columns with a list of values df[1:5, ["Name", "Age", "Job"]] = [("Alice", 30, "Manager"), ("Bob", 28, "Developer"), ("Charlie", 35, "Designer"), ("David", 32, "Analyst")] # See result print(df)
This will output:
┌───┬─────────┬──────┬───────────┐ │ │ Name │ Age │ Job │ ├───┼─────────┼──────┼───────────┤ │ 0 │ Billy │ 30 │ Barber │ │ 1 │ Alice │ 30 │ Manager │ │ 2 │ Bob │ 28 │ Developer │ │ 3 │ Charlie │ 35 │ Designer │ │ 4 │ David │ 32 │ Analyst │ └───┴─────────┴──────┴───────────┘ [5 rows x 3 columns]
Values can also be set along the row axes:
# Create a new column "Hobby" and set the values df["Hobby"] = [('Fishing',), ('Biking',), ('Computers',), ('Photography',), ('Studying',)] # See result print(df)
This will output:
┌───┬─────────┬──────┬───────────┬─────────────┐ │ │ Name │ Age │ Job │ Hobby │ ├───┼─────────┼──────┼───────────┼─────────────┤ │ 0 │ Billy │ 30 │ Barber │ Fishing │ │ 1 │ Alice │ 30 │ Manager │ Biking │ │ 2 │ Bob │ 28 │ Developer │ Computers │ │ 3 │ Charlie │ 35 │ Designer │ Photography │ │ 4 │ David │ 32 │ Analyst │ Studying │ └───┴─────────┴──────┴───────────┴─────────────┘ [5 rows x 4 columns]
Note
If
update_valuesis anotherSQLDataModelobject, its data will be normalized using theSQLDataModel.data()method.The
target_indiciesparameter can be an integer, a tuple of disconnected row indices, a slice representing a range of rows, a string or list of strings representing column names, or a tuple combining row and column indices.Values can be single values or iterables matching the specified rows and columns.
See
SQLDataModel.apply()for setting values using a function.
- Changelog:
- Version 0.7.5 (2024-06-14):
Added row indicies masking to allow selective updating when
update_valuesis also an instance ofSQLDataModelusingtarget_indiciesas mask.
- Version 0.1.9 (2024-03-19):
New method.
- __sub__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
-operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to subtract from each element in the SQLDataModel.
- Raises:
TypeError – If the provided
valueis not a valid type (int or float).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as subtraction) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.
- Returns:
A new SQLDataModel resulting from the subtraction operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar subtraction df['x - 100'] = df['x'] - 100 # Perform vector subtraction using another column df['x - y'] = df['x'] - df['y'] # View both results print(df)
This will output:
┌─────┬─────┬─────────┬───────┐ │ x │ y │ x - 100 │ x - y │ ├─────┼─────┼─────────┼───────┤ │ 2 │ 10 │ -98 │ -8 │ │ 4 │ 20 │ -96 │ -16 │ │ 8 │ 30 │ -92 │ -22 │ │ 16 │ 40 │ -84 │ -24 │ │ 32 │ 50 │ -68 │ -18 │ └─────┴─────┴─────────┴───────┘ [5 rows x 4 columns]
Note
Mixing subtractors such as
int + floatwill work, however an exception will be raised when attempting to perform subtraction on incompatible types such asstr - float.See
SQLDataModel.__rsub__()for right side operand subtraction operations.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- __truediv__(value: int | float | SQLDataModel) SQLDataModel[source]
Implements the
/operator functionality for compatibleSQLDataModeloperations.- Parameters:
value (int | float | SQLDataModel) – The value to divide each element in the SQLDataModel by.
- Raises:
TypeError – If the provided
valueis not a valid type (int, float or SQLDataModel).DimensionError – Raised when the dimensions of the provided
valueare incompatible with the current model’s dimensions. For example, attempting to perform an operation (such as division) on data of shape(4, 1)with values of shape(3, 2)will raise this exception.ZeroDivisionError – If
valueis 0.
- Returns:
A new SQLDataModel resulting from the division operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['x', 'y'] data = [[2,10], [4,20], [8,30], [16,40], [32,50]] # Create the model df = sdm.SQLDataModel(data, headers) # Perform scalar division df['y / 10'] = df['y'] / 10 # Perform vector division using another column df['y / x'] = df['y'] / df['x'] # View both results print(df)
This will output:
┌─────┬─────┬────────┬───────┐ │ x │ y │ y / 10 │ y / x │ ├─────┼─────┼────────┼───────┤ │ 2 │ 10 │ 1.00 │ 5.00 │ │ 4 │ 20 │ 2.00 │ 5.00 │ │ 8 │ 30 │ 3.00 │ 3.75 │ │ 16 │ 40 │ 4.00 │ 2.50 │ │ 32 │ 50 │ 5.00 │ 1.56 │ └─────┴─────┴────────┴───────┘ [5 rows x 4 columns]
Note
Mixing divisor types such as
int / floatwill work, however an exception will be raised when attempting to perform division on incompatible types such asstr / float.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- _calculate_col_widths(index: bool = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, split_row: int = None, index_rep: str = None) dict[str, int][source]
Calculate the maximum column widths for each header column based on the provided conditions to assist with representation methods.
- Parameters:
index (bool, optional) – Indicates whether to include the index column in the calculations. Default is None, using
SQLDataModel.display_indexvalue.min_column_width (int, optional) – Minimum width for columns. Default is None, using
SQLDataModel.minimum_column_widthvalue.max_column_width (int, optional) – Maximum width for columns. Default is None, using
SQLDataModel.maximum_column_widthvalue.float_precision (int, optional) – Precision for displaying float values. Default is None, using
SQLDataModel.display_float_precisionvalue.split_row (int, optional) – Row index to determine vertical truncation. If None, no vertical truncation is applied.
index_rep (str, optional) – String representation for the index. If None, uses a single whitespace character
' 'to represent the index column.
- Returns:
Dictionary mapping each header column to its maximum calculated width as
{'column': width}.- Return type:
dict[str, int]
Example:
import sqldatamodel as sdm # Sample data headers = ['User', 'Key', 'Value', 'Active'] data = [ ('Allison', 130, 237.03, True), ('Bobby', -400, 723.41, False), ('Connor', 698, 154.70, False), ('Dimitry', 287, 409.14, True) ] # Create the model df = sdm.SQLDataModel(data, headers) # Calculate the max column widths col_widths = df._calculate_col_widths() # View result print(col_widths)
This will output the maximum widths for each column calculated by the column name and the corresponding values:
{'idx': 1, 'User': 7, 'Key': 4, 'Value': 7, 'Active': 6}Note
When
index_repis provided, the length of the index representation will be used when calculating the maximum column width.When
split_rowis provided, width calculation checks are restricted to only the top and bottom N number of rows specified.Used by
SQLDataModel.to_string()to determine appropriate column representation widths.
- Changelog:
- Version 2.3.0 (2026-01-21):
Modified handling of bytes/blob data to use sqlite’s quote function instead of printf to align with related methods in order to improve bytes representation
- Version 0.11.0 (2024-07-05):
New method.
- _generate_sql_stmt(columns: list[str] = None, rows: int | slice | tuple | str = None, index: bool = True, na_rep: str = None) str[source]
Generate an SQL statement for fetching specific columns and rows from the model, duplicate column references are aliased in order of appearance.
- Parameters:
columns (list of str, optional) – The list of columns to include in the SQL statement. If not provided, all columns from the model will be included.
rows (int, slice, tuple, optional) – The rows to include in the SQL statement. It can be an integer for a single row, a slice for a range of rows, or a tuple for specific row indices. If not provided, all rows will be included.
index (bool, optional) – If True, include the primary index column in the SQL statement.
na_rep (str, optional) – If provided, all null or empty string values are replaced with value.
- Returns:
The generated SQL statement.
- Return type:
str
Note
No validation is performed on row or column indicies, see
SQLDataModel._validate_indicies()for implementation and usage.See
SQLDataModel._generate_sql_stmt_fetchall()for fetching all model data without predicates or filters.
- Changelog:
- Version 0.5.1 (2024-05-10):
Modified to allow
rowsargument to be provided directly as a string predicate to bypass numeric range-based selections.
- Version 0.4.0 (2024-04-23):
Added
nap_repparameter to fill null or missing fields with provided value.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.9 (2024-03-19):
New method.
- _generate_sql_stmt_fetchall(index: bool = True) str[source]
Generates an SQL statement for fetching all current rows and columns in
SQLDataModel.- Parameters:
index (bool, optional) – Whether or not to include index column in the SQL statement. Default is True, including the index.
- Returns:
The generated SQL statement selecting all rows and columns.
- Return type:
str
Example:
import sqldatamodel as sdm # Create a sample model df = sdm.from_shape(shape=(10,3), headers=['Name','Age','Sex']) # Generate an SQL statement for all data sql_stmt = df._generate_sql_stmt_fetchall(index=False) # View it print(sql_stmt)
This will output statement required to fetch all the data:
SELECT "Name" AS "Name", "Age" AS "Age", "Sex" AS "Sex" FROM "sdm" ORDER BY "idx"Note
Used internally for methods selecting all the current rows and columns
See
SQLDataModel._generate_sql_stmt()for generating statements for specified rows and columns only.
- Changelog:
- Version 0.10.0 (2024-06-29):
New method.
- _generate_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = None) tuple[tuple[str]][source]
Generates the character sets required for formatting
SQLDataModelaccording to the value currently set atSQLDataModel.table_style.- Parameters:
style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple'], optional) – The table style to return. Default is value set on
SQLDataModel.table_style.- Returns:
A 4-tuple containing the characters required for top, middle, row and lower table sections.
- Return type:
tuple[tuple[str]]
Note
This method is called by
SQLDataModel.__repr__()to parse the characters necessary for constructing the tabular representation of theSQLDataModel, any modifications or changes to this method may result in unexpected behavior.
- Changelog:
- Version 0.11.0 (2024-07-05):
Added ‘latex’ style format.
- Version 0.9.3 (2024-06-28):
Added ‘rst-simple’ and ‘rst-grid’ style formats.
- Version 0.3.10 (2024-04-16):
Added
styleparameter to allow use bySQLDataModel.to_text()to generate new formatting styles introduced in version 0.3.9.
- Version 0.3.8 (2024-04-12):
New method.
- _get_display_args(include_dtypes: bool = False) dict[source]
Retrieves the current display configuration settings of the
SQLDataModelwith the correctkwargsfor the classSQLDataModel.__init__()method.- Parameters:
include_dtypes (bool, optional) – Whether
SQLDataModel.dtypesshould be included in the result. Default is False, including only display arguments.- Returns:
A dictionary containing the display configuration settings in the format
{'setting': 'value'}.- Return type:
dict
- Display Properties:
SQLDataModel.display_max_rows: The maximum number of rows to display.SQLDataModel.min_column_width: The minimum width of columns when displaying the model.SQLDataModel.max_column_width: The maximum width of columns when displaying the model.SQLDataModel.column_alignment: The alignment of columns (‘left’, ‘center’, ‘right’ or ‘dynamic’).SQLDataModel.display_color: The color to use when displaying the table, None by default.SQLDataModel.display_index: True if displaying index column, False otherwise.SQLDataModel.display_float_precision: The precision for displaying floating-point numbers.SQLDataModel.table_style: The table styling format to use for strng representations of the model.
- Dtype Property:
SQLDataModel.dtypes: A dictionary mapping the current model’s columns to their corresponding Python data type.
- Changelog:
- Version 0.6.2 (2024-05-15):
Added
include_dtypesparameter for use by methods such asSQLDataModel.min()andSQLDataModel.max()for operations that require returning the results of SQL fetch statements.
- Version 0.1.9 (2024-03-19):
New method.
- _get_sql_create_stmt() str[source]
Retrieves the SQL create table statement used to create the current SQLDataModel.
- Returns:
The SQL create table statement for the SQLDataModel.
- Return type:
str
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the sample model df = sdm.SQLDataModel(data,headers) # Retrieve the create statement for the SQLDataModel create_stmt = df._get_sql_create_stmt() # Print the returned statement print(create_stmt)
This will output:
CREATE TABLE "sdm" ("idx" INTEGER PRIMARY KEY,"first" TEXT,"last" TEXT,"age" INTEGER)- Changelog:
- Version 0.1.5 (2023-11-24):
New method.
- _update_indicies() None[source]
Updates the
SQLDataModel.indiciesandSQLDataModel.row_countproperties of theSQLDataModelinstance representing the current valid row indicies and count.- Returns:
None
Note
This method is called internally any time the
SQLDataModel.row_countproperty is subject to change, or data manipulation requires updating the current values.There is no reason to call this method manually unless the model has been changed outside of the standard instance methods.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- _update_indicies_deterministic(row_index: int) None[source]
Quick implementation to update the
SQLDataModel.indiciesandSQLDataModel.row_countproperties of theSQLDataModelinstance representing the current valid row indicies and count based on the last inserted rowid.- Returns:
None
Note
This method is called internally any time the
SQLDataModel.row_countproperty is subject to deterministic change to avoid the more expensive call toSQLDataModel._update_indicies()
- Changelog:
- Version 0.6.0 (2024-05-14):
Improves performance for updating row indicies when update is deterministic.
New method.
- _update_model_metadata(update_row_meta: bool = False) None[source]
Generates and updates metadata information about the columns and optionally the rows in the
SQLDataModelinstance based on the current model.- Attributes updated:
SQLDataModel.header_master: Master dictionary of column metadata.SQLDataModel.headers: List of current model headers, order retained.SQLDataModel.column_count: Number of columns in current model.SQLDataModel.shape: The current(rows, cols)dimensions of the model.SQLDataModel.dtype: The current{'col': 'dtype'}mapping of the model.SQLDataModel.indicies: Optionally updated, represents current valid row indicies.SQLDataModel.row_count: Optionally updated, represents current row count.
- Parameters:
update_row_meta (bool, optional) – If True, updates row metadata information; otherwise, retrieves column metadata only (default).
- Returns:
None
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age', 'service_time'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model with sample data df = sdm.SQLDataModel(data, headers) # View header master print(df.header_master)
This will output:
{'first': ('TEXT', 'str', True, '<'), 'last': ('TEXT', 'str', True, '<'), 'age': ('INTEGER', 'int', True, '>'), 'service_time': ('REAL', 'float', True, '>'), 'idx': ('INTEGER', 'int', False, '>')}
Example Attributes Modified:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age', 'service_time'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 0.7), (2, 'mike', 'harlin', 3), (3, 'pat', 'douglas', 11.5) ] # Create the model with sample data df = sdm.SQLDataModel(data, headers) # Get current column count num_cols_before = df.column_count # Add new column df['new_column'] = 'empty' # Method is called behind the scenes df._update_model_metadata() # Get new column count num_cols_after = df.column_count # View difference print(f"cols before: {num_cols_before}, cols after: {num_cols_after}")
Note
This method is called after operations that may modify the current model’s structure and require synchronization.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- _update_rows_and_columns_with_values(rows_to_update: tuple[int] = None, columns_to_update: list[str] = None, values_to_update: list[tuple] = None) None[source]
Generates and executes a SQL update statement to modify specific rows and columns with provided values in the SQLDataModel.
- Parameters:
rows_to_update – A tuple of row indices to be updated. If set to None, it defaults to all rows in the SQLDataModel.
columns_to_update – A list of column names to be updated. If set to None, it defaults to all columns in the SQLDataModel.
values_to_update – A list of tuples representing values to update in the specified rows and columns.
- Raises:
TypeError – If the
values_to_updateparameter is not a list or tuple.DimensionError – If the shape of the provided values does not match the specified rows and columns.
SQLProgrammingError – If the
values_to_updateparameter contains invalid or SQL incompatible data.
Example:
import sqldatamodel as sdm # Create the model with some sample data df = sdm.SQLDataModel( data=[(23, 'W'), (24, 'X'), (25, 'Y'), (26, 'Z')], headers=['column1', 'column2'] ) # Update specific rows and columns with provided values df._update_rows_and_columns_with_values( rows_to_update=(1, 2, 3), columns_to_update=["column1", "column2"], values_to_update=[(10, 'A'), (20, 'B'), (30, 'C')] ) # Create a new column named "new_column" with default values df._update_rows_and_columns_with_values( columns_to_update=["new_column"], values_to_update=[(None,)] * df.row_count )
Note
Used by
SQLDataModel.__setitem__()to broadcast updates across row and column index ranges.To create a new column, pass a single header item in a list to the
columns_to_updateparameter.To copy an existing column, pass the corresponding data is a list of tuples to the
values_to_updateparameter.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- _validate_column(column: str | int | slice | Iterable, unmodified: bool = False) list[str][source]
Utility function used to validate column selection and return parsed values.
- Parameters:
column (str|int|slice|Iterable) – The column selection to validate, argument should reflect the integer indexes or column names.
unmodified (bool, optional) – Whether
columnshould be returned as originally indexed. Default is False, returning as list.
- Raises:
TypeError – If
columnis not one of type ‘str’, ‘int’, ‘slice’ or ‘Iterable’ representing the integer index or values of column(s) to select.IndexError – If
columnis outside of current model range bounded bySQLDataModel.column_countwhether positively or negatively indexed.DimensionError – If
len(column)is greater thanSQLDataModel.column_countwhen provided as an iterable or sequence.
- Returns:
A list containing the validated column values resulting from the selection.
- Return type:
list[str]
Example:
import sqldatamodel as sdm # Create a 10 rows x 6 column model df = sdm.from_shape((10, 6), headers=['A','B','C','D','E','F']) # Various column index types column_indicies = [ 2, -3, ['A','B'], (4, 5, 1), [-1, 2, 0], slice(1, 3), slice(-6, -1, 2), ] # Loop over the indicies for column_index in column_indicies: # Print original and validated indexes print(f"{column_index} --> {sdm._validate_column(column_index)}")
This will output the original and validated column indexes:
2 --> ['C'] -3 --> ['D'] ['A', 'B'] --> ['A', 'B'] (4, 5, 1) --> ['E', 'F', 'B'] [-1, 2, 0] --> ['F', 'C', 'A'] slice(1, 3, None) --> ['B', 'C'] slice(-6, -1, 2) --> ['A', 'C', 'E']
Note
Columns are referenced by their integer index or directly by their value as a column name, when using integers
column = 0andcolumn = -1will always return the first and last columns, respectively.Validated column outputs will be returned as a list containing the results of the indexed columns found at
SQLDataModel.headerswith original ordering intact.See
SQLDataModel._validate_row()for validating row indicies and returning the corresponding values.
- Changelog:
- Version 0.7.9 (2024-06-20):
New method.
- _validate_indicies(indicies) tuple[tuple[int], list[str]][source]
Validates and returns a predictable notation form of indices for accessing rows and columns in the
SQLDataModelfrom varying 2-dimensional(row, column)indexing input types.- Two dimensional indexing:
tuple[row_index, column_index]: Where row_index and column_index are defined below.
- Row indexing:
int: Single integer index. E.g.,sdm[0]orsdm[-1]slice: Range of row indices. E.g.,sdm[2:5]orsdm[-8:-1]set[int]: Discontiguous row indicies. E.g.,sdm[{13, 7, 42}]tuple[int]: Like set, discontiguous row indices. E.g.,sdm[(-1, 9, 11)]
- Column indexing:
int: Single integer index. E.g.,sdm[:, 0]orsdm[:, -1]str: Single column name. E.g.,sdm['Col A']orsdm['Name']list[str]: List of column names. E.g.,sdm[:,['A', 'B', 'F']]list[int]: List of column indicies. E.g.,sdm[:,[0, 3, 4, 9, -2]]
- Parameters:
indicies – Specifies the indices for rows and columns.
- Raises:
TypeError – If the type of indices is invalid such as a float for row index or a boolean for a column name index.
ValueError – If the indices are outside the current model range or if a column is not found in the current model headers when indexed by column name as
str.IndexError – If the column indices are outside the current column range or if a column is not found in the current model headers when indexed by
int.
- Returns:
A tuple containing validated row indices as a tuple and validated column indices as a list of column names.
- Return type:
tuple[tuple[int], list[str]]
Example:
import sqldatamodel as sdm # Create a 10 rows by 4 columns model df = sdm.from_shape(shape=(10,4), headers=['A','B','C','D']) # Index pairs to validate input_idx = [ (0, 'A'), (-1, ['B','D']), ({2,-7,-2}, (-2,-1)), (slice(-6,-1,2), slice(0,3)) ] # Store validated pairs valid_idx = [] # Loop over the [row, col] pairs for row, col in input_idx: # Validated and store the pairs valid_idx.append(df._validate_indicies((row, col))) # View input and validated pairs for original, validated in zip(input_idx, valid_idx): print(f"{original} --> {validated}")
This will output both the input and validated row, column index pairs:
(0, 'A') --> ((0,), ['A']) (-1, ['B', 'D']) --> ((9,), ['B', 'D']) ({-7, 2, -2}, (-2, -1)) --> ((3, 2, 8), ['C', 'D']) (slice(-6, -1, 2), slice(0, 3, None)) --> ((4, 6, 8), ['A', 'B', 'C'])Note
This method expects indicies to be provided as a two dimensional pair of
(row, column)indicies, with exceptions made for single row integer indexes or single column names.Use empty slice notation to include all indicies from a given dimension, for example
sdm[:, :]will always return the full model by accessing all rows and all columns.See
SQLDataModel.__getitem__()andSQLDataModel.__setitem__()for implementations relying on this method.See
SQLDataModel._validate_row()for one dimensional validation against a singlerowindex.See
SQLDataModel._validate_column()for one dimensional validation against a singlecolumnindex.
- Changelog:
- Version 0.8.1 (2024-06-23):
Modified implementation to leverage new utility methods
SQLDataModel._validate_row()andSQLDataModel._validate_column()to improve performance.New method.
- _validate_row(row: int | slice | Iterable[int], unmodified: bool = False, allow_zero_rows: bool = True) tuple[int][source]
Utility function used to validate row selection and return parsed values.
- Parameters:
row (int|slice|Iterable[int]) – The row selection to validate, argument should reflect the integer indexes of the rows to select.
unmodified (bool, optional) – Whether
rowshould be returned as originally indexed. Default is False, returning as tuple.allow_zero_rows (bool, optional) – Whether
row, when provided as a slice, is allowed to return zero valid row indicies. Default is True, validating on any slice argument.
- Raises:
TypeError – If
rowis not one of type ‘int’, ‘slice’ or ‘Iterable’ representing the integer index of row(s) to select.IndexError – If
rowis outside of current model range bounded bySQLDataModel.row_countwhether positively or negatively indexed. Is not raised whenallow_zero_rowsis True androwis provided as a slice of indicies.
- Returns:
A tuple containing the validated row values resulting from the selection.
- Return type:
tuple[int]
Example:
import sqldatamodel as sdm # Create a 10 rows x 3 column model df = sdm.from_shape(shape=(10, 3), headers=['A','B','C']) # Various row index types row_indicies = [ 2, -3, {4,5,8}, [-1,5,0], slice(2, 5), slice(-8, -1, 2), ] # Loop over the indicies for row_index in row_indicies: # Print original and validated indexes print(f"{row_index} --> {sdm._validate_row(row_index)}")
This will output the original and validated row indexes:
2 --> (2,) -3 --> (7,) {8, 4, 5} --> (8, 4, 5) [-1, 5, 0] --> (9, 5, 0) slice(2, 5, None) --> (2, 3, 4) slice(-8, -1, 2) --> (2, 4, 6, 8)Note
Rows are referenced by their integer index and not their value, as such
row = 0androw = -1will always return the first and last rows, respectively.An input of
row==SQLDataModel.row_countis allowed to accomodate the append row syntax ofsdm[sdm.row_count] = (values).See
SQLDataModel._validate_column()for validating column indicies and returning the corresponding headers.
- Changelog:
- Version 2.3.1 (2026-01-22):
Modified to allow validation of slice index regardless of number of rows returned when
allow_zero_rowsis True.
- Version 0.7.9 (2024-06-20):
New method.
- add_column_with_values(column_name: str, value=None) None[source]
Adds a new column with the specified
column_nameto theSQLDataModel. The new column is populated with the values provided in thevalueargument. Ifvalueis not provided (default), the new column is populated with NULL values.- Parameters:
column_name (str) – The name of the new column to be added.
value – The value to populate the new column. If None (default), the column is populated with NULL values. If a valid column name is provided, the values of that column will be used to fill the new column.
- Raises:
DimensionError – If the length of the provided values does not match the number of rows in the model.
TypeError – If the data type of the provided values is not supported or translatable to an SQL data type.
Example:
import sqldatamodel as sdm # Create model from data df = sdm.from_csv('data.csv') # Add new column with default value 42 df.add_column_with_values('new_column', value=42) # Add new column by copying values from an existing column df.add_column_with_values('new_column', value='existing_column')
Note
Many other methods, including
SQLDataModel.__setitem__()rely on this method, therefore modifying it may cause unpredictable behavior.Determination for when to copy existing versus when to assign string is value is done by
SQLDataModel.__eq__()against both values
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- append_row(values: list | tuple = None) None[source]
Appends
valuesas a new row in theSQLDataModelat the next available index based on the current max row index fromSQLDataModel.indicies. Ifvalues = None, an empty row with SQLnullvalues will be used.- Parameters:
values (list or tuple, optional) – The values to be inserted into the row. If not provided or set to None, an empty row with SQL
nullvalues will be inserted.- Raises:
TypeError – If
valuesis provided and is not of type list or tuple.DimensionError – If the number of values provided does not match the current column count.
SQLProgrammingError – If there is an issue with the SQL execution during the insertion.
- Returns:
None
Example:
import sqldatamodel as sdm # Create a rowless model df = sdm.SQLDataModel(headers=['Name', 'Age']) # Append a row with values df.append_row(['Alice', 31]) # Append another row df.append_row(['John', 48]) # View result print(df)
This will output:
┌───┬───────┬──────┐ │ │ Name │ Age │ ├───┼───────┼──────┤ │ 0 │ Alice │ 31 │ │ 1 │ John │ 48 │ └───┴───────┴──────┘ [2 rows x 2 columns]
Note
If no values are provided,
Noneor SQL ‘null’ will be used for the values.Rows will be appended to the bottom of the model at one index greater than the current max index.
- Changelog:
- Version 0.6.0 (2024-05-14):
Mirrors previous behavior of
SQLDataModel.insert_row()for versions <= 0.5.2.New method.
- apply(func: Callable) SQLDataModel[source]
Applies
functo the currentSQLDataModelobject and returns a modifiedSQLDataModelby passing its current values to the argument offuncupdated with the output.- Parameters:
func (Callable) – A callable function to apply to the
SQLDataModel.- Raises:
TypeError – If the provided argument for
funcis not a valid callable.SQLProgrammingError – If the provided function is not valid based on the current SQL datatypes.
- Returns:
A modified
SQLDataModelresulting from the application offunc.- Return type:
SQLDataModel
Examples:
Applying to Single Column
import sqldatamodel as sdm # Create the SQLDataModel: df = sdm.from_csv('employees.csv', headers=['First Name', 'Last Name', 'City', 'State']) # Create the function: def uncase_name(x): return x.lower() # Apply to existing column: df['First Name'] = df['First Name'].apply(uncase_name) # existing column will be updated with new values # Or create new one by passing in a new column name: df['New Column'] = df['First Name'].apply(uncase_name) # new column will be created with returned values
Applying to Multiple Columns
import sqldatamodel as sdm # Create the function, note that ``func`` must have the same number of args as the model ``.apply()`` is called on: def summarize_employee(first, last, city, state) summary = f"{first} {last} is from {city}, {state}" # Create a new 'Employee Summary' column for the returned values: df['Employee Summary'] = df.apply(summarize_employee)
Applying a Built-in Function
import math import sqldatamodel as sdm # Create the SQLDataModel: df = sdm.from_csv('number-data.csv', headers=['Number']) # Apply the math.sqrt function to the original 'Number' column: df_sqrt = df.apply(math.sqrt)
Applying a Lambda Function
import sqldatamodel as sdm # Create the SQLDataModel: df = sdm.from_csv('example.csv', headers=['Column1', 'Column2']) # Create a new 'Column3' using the values returned from the lambda function: df['Column3'] = df.apply(lambda x, y: x + y) # Alternatively, an existing column can be updated in place: df['Column1'] = df['Column1'].apply(lambda x: x // 4)
Note
The number of
argsin the inspected signature offuncmust equal the current number ofSQLDataModelcolumns.The number of
funcargs must match the current number of columns in the model, or anExceptionwill be raised.Use
SQLDataModel.generate_apply_function_stub()method to return a preconfigured template using currentSQLDataModelcolumns and dtypes to assist.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- apply_function_to_column(func: Callable, column: str | int) None[source]
Applies the specified callable function (func) to the provided
SQLDataModelcolumn. The function’s output is used to update the values in the column. For broader uses or more input flexibility, see related method apply().- Parameters:
func (Callable) – The callable function to apply to the column.
column (str | int) – The name or index of the column to which the function will be applied.
- Raises:
TypeError – If the provided column argument is not a valid type (str or int).
IndexError – If the provided column index is outside the valid range of column indices.
ValueError – If the provided column name is not valid for the current model.
SQLProgrammingError – If the provided function return types or arg count is invalid or incompatible to SQL types.
- Returns:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('data.csv') # Apply upper() method using lambda function to column ``name`` df.apply_function_to_column(lambda x: x.upper(), column='name') # Apply addition through lambda function to column at index 1 df.apply_function_to_column(lambda x, y: x + y, column=1)
Note
This method is a simplified version of the
SQLDataModel.apply()method, which can be used for arbitrary function params and inputs.If providing a function name, ensure it can be used a valid
sqlite3identifier for the instance’s connection otherwiseSQLProgrammingErrorwill be raised.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- astype(dtype: Callable | Type | Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) SQLDataModel[source]
Casts the model data into the specified python
dtype.- Parameters:
dtype (Callable|Type|Literal['bool', 'bytes', 'datetime', 'float', 'int', 'None', 'str']) – The target python data type to cast the values to.
- Raises:
ValueError – If
dtypeis a string and not one of ‘bool’, ‘bytes’, ‘datetime’, ‘float’, ‘int’, ‘None’, ‘str’.TypeError – If
dtypeis aTypeobject that does not map to the current values, such as trying to convert a string column using the built-infloattype.
- Returns:
The data casted as the specified type as a new
SQLDataModel.- Return type:
SQLDataModel
Warning
Type casting will coerce any nonconforming values to the
dtypebeing set, this means data will be lost if casting values to incompatible types.
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height', 'Hired'] data = [ ('John', 30, 175.3, 'True'), ('Alice', 28, 162.0, 'True'), ('Travis', 35, 185.8, 'False') ] # Create the model df = sdm.SQLDataModel(data, headers) # See what we're working with print(df)
This will output:
┌────────┬──────┬─────────┬───────┐ │ Name │ Age │ Height │ Hired │ ├────────┼──────┼─────────┼───────┤ │ John │ 30 │ 175.30 │ True │ │ Alice │ 28 │ 162.00 │ True │ │ Travis │ 35 │ 185.80 │ False │ └────────┴──────┴─────────┴───────┘ [3 rows x 4 columns]
We can return the values as new types or save them to a column:
# Convert the string based 'Hired' column to boolean values df['Hired'] = df['Hired'].astype('bool') # Let's also create a new 'Height' column, this time as an integer df['Height int'] = df['Height'].astype('int') # See the new values and their types print(df)
This will output:
┌────────┬──────┬─────────┬───────┬────────────┐ │ Name │ Age │ Height │ Hired │ Height int │ ├────────┼──────┼─────────┼───────┼────────────┤ │ John │ 30 │ 175.30 │ 1 │ 175 │ │ Alice │ 28 │ 162.00 │ 1 │ 162 │ │ Travis │ 35 │ 185.80 │ 0 │ 185 │ └────────┴──────┴─────────┴───────┴────────────┘ [3 rows x 5 columns]
Types can also be passed directly to
dtype:# Convert 'Age' directly to float using the built-in type: df['Age float'] = df['Age'].astype(float) # View updated model print(df)
This will output the result of mapping the built-in
floattype to ‘Age’ as a new column:┌────────┬─────┬─────────┬───────┬────────────┬───────────┐ │ Name │ Age │ Height │ Hired │ Height int │ Age float │ ├────────┼─────┼─────────┼───────┼────────────┼───────────┤ │ John │ 30 │ 175.30 │ 1 │ 175 │ 30.00 │ │ Alice │ 28 │ 162.00 │ 1 │ 162 │ 28.00 │ │ Travis │ 35 │ 185.80 │ 0 │ 185 │ 35.00 │ └────────┴─────┴─────────┴───────┴────────────┴───────────┘ [3 rows x 6 columns]
Note
Unless the returned values are saved as a new column, using this method does not change the underlying column’s type currently assigned to it, to modify the column type use
SQLDataModel.set_column_dtypes()instead.Any
Noneornullvalues encountered will not be coerced to the specifieddtype, seeSQLDataModel.fillna()for handling and filling null values appropriately.When passing a type directly,
dtype=Type, the type must be aCallablethat can be mapped directly to a value like the built-instr,int,floatandbooltypes.
- Changelog:
- Version 0.7.6 (2024-06-16):
Modified to allow
CallableorTypeto be provided directly fordtypeargument to map to data and return as new model for broader type conversion.
- Version 0.2.1 (2024-03-24):
New method.
- column_alignment[source]
The column alignment to use for string representations of the data, value must be one of
['dynamic','left','center','right']Default is'dynamic', using right-alignment for numeric columns and left-aligned for all others.- Type:
str
- concat(other: SQLDataModel | list | tuple, inplace: bool = True) None | SQLDataModel[source]
Concatenates the provided data to
SQLDataModelalong the row axis, returning a new model or modifying the existing instance inplace.- Parameters:
other (SQLDataModel | list | tuple) – The SQLDataModel, list, or tuple to concatenate or append.
inplace (bool, optional) – If True (default), performs the concatenation in-place, modifying the current model. If False, returns a new
SQLDataModelinstance with the concatenated result.
- Returns:
Nonewheninplace = TrueandSQLDataModelwhenin_place = False- Return type:
NoneorSQLDataModel- Raises:
TypeError – If the
otherargument is not one of typeSQLDataModel,list, ortuple.ValueError – If
otheris a list or tuple with insufficient data where the column dimension is < 1.DimensionError – If the column count of the current model does not match the column count of the
othermodel or tuple.
Example:
import sqldatamodel as sdm # Datasets a and b data_a = (['A', 1], ['B', 2]) data_b = (['C', 3], ['D', 4]) # Create the models df_a = SQLDataModel(data_a, headers=['letter', 'number']) df_b = SQLDataModel(data_b, headers=['letter', 'number']) # Concatenate the two models df_ab = df_a.concat(df_b, inplace=False) # View result print(df_ab)
This will output:
┌────────┬────────┐ │ letter │ number │ ├────────┼────────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ │ D │ 4 │ └────────┴────────┘ [4 rows x 2 columns]
Concatenation can be done using other objects as well:
# List or tuples can also be used directly data_e = ['E', 5] # Append in place df_ab.concat(data_e) # View result print(df_ab)
This will output:
┌───┬────────┬────────┐ │ │ letter │ number │ ├───┼────────┼────────┤ │ 0 │ A │ 1 │ │ 1 │ B │ 2 │ │ 2 │ C │ 3 │ │ 3 │ D │ 4 │ │ 4 │ E │ 5 │ └───┴────────┴────────┘ [5 rows x 2 columns]
Note
Models must be of compatible dimensions with equal
column_countor equivalent dimension iflistortupleHeaders are inherited from the model calling the
SQLDataModel.concat()method whether done inplace or being returned as new instance.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- contains(pat: str | Iterable[str], case: bool = True) set[int][source]
Return the row indices that contain the specified pattern(s) in any column from the model, converting to
str(value)for comparison.- Parameters:
pat (str | Iterable[str]) – The pattern or iterable of patterns to search for within the data.
case (bool, optional) – If True (default), the search is case-sensitive. If False, the search is case-insensitive.
- Raises:
TypeError – If argument for
patis not of type ‘str’ or an iterable of type ‘str’ representing the substring pattern(s).- Returns:
Set of row indices containing values that match the pattern(s).
- Return type:
set[int]
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Sex', 'City'] data = [ ('Mike', 31, 'M', 'Chicago'), ('John', 25, 'M', 'Dayton'), ('Alice', 27, 'F', 'Boston'), ('Sarah', 35, 'F', 'Houston'), ('Bobby', 42, 'M', 'Chicago'), ('Steve', 28, 'F', 'Austin'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter for rows containing the string 'Chicago' matching_indicies = df['City'].contains('Chicago') # Apply filter to model df_chicago = df[matching_indicies] # View result print(df_chicago)
This will output the result of applying the filter to the model:
┌───┬───────┬─────┬─────┬─────────┐ │ │ Name │ Age │ Sex │ City │ ├───┼───────┼─────┼─────┼─────────┤ │ 0 │ Mike │ 31 │ M │ Chicago │ │ 4 │ Bobby │ 42 │ M │ Chicago │ └───┴───────┴─────┴─────┴─────────┘ [2 rows x 4 columns]
Instead of searching a single column, the entire model can be searched:
# Method can also search all columns, and be applied directly df_with_e = df[df.contains('E', case=False)] # View result print(df_with_e)
This will output the result of a case-insensitive search:
┌───┬───────┬─────┬─────┬─────────┐ │ │ Name │ Age │ Sex │ City │ ├───┼───────┼─────┼─────┼─────────┤ │ 0 │ Mike │ 31 │ M │ Chicago │ │ 2 │ Alice │ 27 │ F │ Boston │ │ 5 │ Steve │ 28 │ F │ Austin │ └───┴───────┴─────┴─────┴─────────┘ [3 rows x 4 columns]
This can be used in combination with the setitem syntax to selectively update values as well:
# Create a 'State' column with a default value df['State'] = None # Filter and set the values that contain the pattern df[df.contains('Chicago'), 'State'] = 'Illinois' # Multiple conditions can be used tx_1 = df.contains('Houston') tx_2 = df.contains('Austin') # Then chained together using set notation df[(tx_1 | tx_2), 'State'] = 'Texas' # Alternatively, an iterable of patterns can be provided df[df.contains(['Houston','Austin']), 'State'] = 'Texas'
Note
Any non-string values are converted using
str(value)for comparisons only.See
SQLDataModel.__eq__()for strict equality comparison operations.See
SQLDataModel.__and__()for more details on bitwise and set operations.See
SQLDataModel.__setitem__()for more details on syntaxdf[row, column] = valueand correct usage.See
SQLDataModel.startswith()andSQLDataModel.endswith()for additional string methods.
- Changelog:
- Version 0.7.8 (2024-06-18):
New method.
- copy(data_only: bool = False) SQLDataModel[source]
Returns a deep copy of the current model as a new
SQLDataModel.- Parameters:
data_only (bool) – If True, only the data is copied, otherwise display and styling parameters are included. Default is False.
- Returns:
A cloned copy from the original as a new
SQLDataModel.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the original model with list styling df = sdm.SQLDataModel(data, headers, table_style='list') # Create two copies, one full and one with data only copy_full = df.copy() copy_data = df.copy(data_only=True) # View both copies print(copy_full) print(copy_data)
This will output both copies, with
copy_fullincluding any styling parameters such astable_style='list':Name Age Height - ------ --- ------- 0 John 30 175.30 1 Alice 28 162.00 2 Travis 35 185.80
With the output for
copy_datacontaining only the original model’s data:┌───┬────────┬─────┬─────────┐ │ │ Name │ Age │ Height │ ├───┼────────┼─────┼─────────┤ │ 0 │ John │ 30 │ 175.30 │ │ 1 │ Alice │ 28 │ 162.00 │ │ 2 │ Travis │ 35 │ 185.80 │ └───┴────────┴─────┴─────────┘
Note
Model headers and dtypes are considered part of the model data and are included when
data_only=True.Default behavior,
data_only=False, includes the following additional display parameters:SQLDataModel.display_max_rows: The maximum number of rows to display.SQLDataModel.min_column_width: The minimum width of columns when displaying the model.SQLDataModel.max_column_width: The maximum width of columns when displaying the model.SQLDataModel.column_alignment: The alignment of columns (‘left’, ‘center’, ‘right’ or ‘dynamic’).SQLDataModel.display_color: The color to use when displaying the table, None by default.SQLDataModel.display_index: True if displaying index column, False otherwise.SQLDataModel.display_float_precision: The precision for displaying floating-point numbers.SQLDataModel.table_style: The table styling format to use for strng representations of the model.
- Changelog:
- Version 0.4.2 (2024-05-03):
New method.
- count() SQLDataModel[source]
Returns a new
SQLDataModelcontaining the counts of non-null values for each column in a row-wise orientation.- Returns:
A new SQLDataModel containing the counts of non-null values in each column.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data with missing values headers = ['Name', 'Age', 'Gender', 'Tenure'] data = [ ('Alice', 25, 'Female', 1.0), ('Bob', None, 'Male', 2.7), ('Charlie', 30, 'Male', None), ('David', None, 'Male', 3.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # Get counts counts = df.count() # View result print(counts)
This will output the count of all non-null values for each column:
┌──────┬─────┬────────┬────────┐ │ Name │ Age │ Gender │ Tenure │ ├──────┼─────┼────────┼────────┤ │ 4 │ 2 │ 4 │ 3 │ └──────┴─────┴────────┴────────┘ [1 rows x 4 columns]
Note
See
SQLDataModel.count_unique()for column-wise count of unique, null and total values for each column.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- count_unique() SQLDataModel[source]
Returns a new
SQLDataModelcontaining the total counts and unique values for each column in the model for both null and non-null values.- Metrics:
'column'contains the names of the columns counted.'na'contains the total number of null values in the column.'unique'contains the total number of unique values in the column.'count'contains the total number of non-null values in the column.'total'contains the total number of all null and non-null values in the column.
- Returns:
A new SQLDataModel containing columns ‘column’, ‘unique’, and ‘count’ representing the column name, total unique values, and total values count, respectively.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Gender'] data = [ ('Alice', 25, 'Female'), ('Bob', 30, None), ('Alice', 25, 'Female') ] # Create the model df = sdm.SQLDataModel(data, headers) # Get the value count information count_model = df.count_unique() # View the count information print(count_model)
This will output:
┌────────┬──────┬────────┬───────┬───────┐ │ column │ na │ unique │ count │ total │ ├────────┼──────┼────────┼───────┼───────┤ │ Name │ 0 │ 2 │ 3 │ 3 │ │ Age │ 0 │ 2 │ 3 │ 3 │ │ Gender │ 1 │ 1 │ 2 │ 3 │ └────────┴──────┴────────┴───────┴───────┘ [3 rows x 5 columns]
Note
See
SQLDataModel.count()for the count of non-null values for each column in a row-wise orientation.
- Changelog:
- Version 0.3.2 (2024-04-02):
Renamed method from
countstocount_uniquefor more precise definition.New method.
- data(index: bool = False, include_headers: bool = False, strict_2d: bool = False) list[tuple][source]
Returns the
SQLDataModeldata as a list of tuples for multiple rows, a single tuple for individual rows, as a single item for individual cells. Data is returned without index and headers by default, useinclude_headers=Trueorindex=Trueto modify.- Parameters:
index (bool, optional) – If True, includes the index in the result; if False, excludes the index. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result; if False, excludes headers. Default is False.
strict_2d (bool, optional) – If True, returns data as a 2-dimensional list of tuples regardless of data dimension. Default is False.
- Returns:
The data currently stored in the model as a list of tuples.
- Return type:
list[tuple]
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data, headers, display_float_precision=2) # View full table print(df)
This will output:
┌────────┬──────┬─────────┐ │ Name │ Age │ Height │ ├────────┼──────┼─────────┤ │ John │ 30 │ 175.30 │ │ Alice │ 28 │ 162.00 │ │ Travis │ 35 │ 185.80 │ └────────┴──────┴─────────┘ [3 rows x 3 columns]
Get data for specific row:
# Grab data from single row row_data = df[0].data() # View it print(row_data)
This will output the row as a tuple of values:
('John', 30, 175.3)Get data for specific column:
# Grab data from single column col_data = df['Name'].data() # View it print(col_data)
This will output the column values as a list of tuples:
[('John',), ('Alice',), ('Travis',)]Note
Many other
SQLDataModelmethods rely on this method, changing it will lead to undefined behavior.See related
SQLDataModel.from_data()for creating a newSQLDataModelfrom existing data sources.Use
strict_2d = Trueto always return data as a list of tuples regardless of data dimension.
- Changelog:
- Version 0.10.0 (2024-06-29):
Modified to use
SQLDataModel._generate_sql_stmt_fetchall()to leverage deterministic behavior of method.
- Version 0.5.0 (2024-05-09):
Added
strict_2dparameter to allow predictable return type regardless of data dimension.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.5 (2023-11-24):
New method.
- deduplicate(subset: list[str] = None, reset_index: bool = True, keep_first: bool = True, inplace: bool = True) None | SQLDataModel[source]
Removes duplicate rows from the SQLDataModel based on the specified subset of columns. Deduplication occurs inplace by default, otherwise use
inplace=Falseto return a newSQLDataModel.- Parameters:
subset (list[str], optional) – List of columns to consider when identifying duplicates. If None, all columns are considered. Defaults to None.
reset_index (bool, optional) – If True, resets the index after deduplication starting at 0; otherwise retains current indicies.
keep_first (bool, optional) – If True, keeps the first occurrence of each duplicated row; otherwise, keeps the last occurrence. Defaults to True.
inplace (bool, optional) – If True, modifies the current SQLDataModel in-place; otherwise, returns a new SQLDataModel without duplicates. Defaults to True.
- Raises:
ValueError – If a column specified in
subsetis not found in the SQLDataModel.- Returns:
If
inplace = Truethe method modifies the current SQLDataModel in-place returnNone, otherwise ifinplace = Falsea newSQLDataModelis returned.- Return type:
NoneorSQLDataModel
Examples:
Based on Single Column
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Deduplicate based on a specific column df.deduplicate(subset='ID', keep_first=True, inplace=True)
Based on Multiple Columns
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Deduplicate based on multiple columns and save to keep both models df_deduped = df.deduplicate(subset=['ID', 'Name'], keep_first=False, inplace=False)
Note
Ordering for
keep_firstis determined by the currentSQLDataModel.sql_idxorder of the instance.For multiple columns ordering is done sequentially favoring first index in
subset, then i+1, …, toi+len(subset)
- Changelog:
- Version 0.1.5 (2023-11-24):
New method.
- describe(exclude_columns: str | list = None, exclude_dtypes: list[Literal['str', 'int', 'float', 'date', 'datetime', 'bool']] = None, ignore_na: bool = True, **kwargs) SQLDataModel[source]
Generates descriptive statistics for columns in the
SQLDataModelinstance based on column dtype including count, unique values, top value, frequency, mean, standard deviation, minimum, 25th, 50th, 75th percentiles, maximum and dtype for specified column.- Parameters:
exclude_columns (str | list, optional) – Columns to exclude from the analysis. Default is None.
exclude_dtypes (list[Literal["str", "int", "float", "date", "datetime", "bool"]], optional) – Data types to exclude from the analysis. Default is None.
ignore_na (bool, optional) – If True, ignores NA like values (‘NA’, ‘ ‘, ‘None’) when computing statistics. Default is True.
**kwargs – Additional keyword arguments to be passed to the
execute_fetchmethod.
- Statistics Described:
count: Total number of non-null values for specified columnunique: Total number of unique values for specified columntop: Top value represented for specified column, ties broken arbitrarilyfreq: Frequency of corresponding value represented in ‘top’ metricmean: Mean as calculated by summing all values and dividing by ‘count’std: Standard Deviation for specified columnUncorrected sample standard deviation for
int,floatdtypesMean time difference represented in number of days for
date,datetimedtypes‘NaN’ for all other dtypes
min: Minimum value for specified columnLeast value for
int,floatdtypesLeast value sorted by alphabetical ascending for
strdtypesEarliest date or datetime for
date,datetimedtypes
p25: Percentile, 25thMax first bin value as determined by quartered binning of values for
int,floatdtypes‘NaN’ for all other dtypes
p50: Percentile, 50thMax second bin value as determined by quartered binning of values for
int,floatdtypes‘NaN’ for all other dtypes
p75: Percentile, 75thMax third bin value as determined by quartered binning of values for
int,floatdtypes‘NaN’ for all other dtypes
max: Maximum value for specified columnGreatest value for
int,floatdtypesGreatest value sorted by alphabetical ascending for
strdtypesLatest date or datetime for
date,datetimedtypes
dtype: Datatype of specified columnPython datatype as determined by relevant class
__name__attribute, e.g. ‘float’ or ‘int’dtypes can be excluded by using
exclude_dtypesparameter
- Returns:
A new SQLDataModel containing a comprehensive set of descriptive statistics for selected columns.
- Return type:
SQLDataModel
Note
Standard deviation is calculated using uncorrected sample standard deviation for numeric dtypes, and timediff in days for datetime dtypes
Ties in unique, top and freq columns are broken arbitrarily as determined by first ordering of values prior to calling
describe()Ties encountered when binning for p25, p50, p75 will favor lower bins for data that cannot be quartered cleanly
Metrics for count, min, p25, p50, p75 and max include non-null values only
Using
ignore_na=Trueonly affects inclusion of ‘NA like’ values such as empty stringsFloating point precision determined by
SQLDataModel.display_float_precisionattribute
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('employees.csv') # View all 10 rows print(df)
This will output:
┌───┬──────────────────┬────────────┬─────────────┬───────────────┬────────┬─────────────────────┐ │ │ name │ hire_date │ country │ service_years │ age │ last_update │ ├───┼──────────────────┼────────────┼─────────────┼───────────────┼────────┼─────────────────────┤ │ 0 │ Pamela Berg │ 2007-06-06 │ New Zealand │ 3.02 │ 56 │ 2023-08-12 17:13:46 │ │ 1 │ Mason Hoover │ 2009-04-19 │ Australia │ 5.01 │ 41 │ 2023-05-18 01:29:44 │ │ 2 │ Veda Suarez │ 2007-07-02 │ Ukraine │ 4.65 │ 26 │ 2023-12-09 15:38:01 │ │ 3 │ John Smith │ 2017-08-12 │ New Zealand │ 3.81 │ 35 │ 2023-03-10 18:23:56 │ │ 4 │ Xavier McCoy │ 2021-04-03 │ France │ 2.95 │ 42 │ 2023-09-27 11:39:08 │ │ 5 │ John Smith │ 2020-10-11 │ Germany │ 4.61 │ 56 │ 2023-12-09 18:41:52 │ │ 6 │ Abigail Mays │ 2021-07-25 │ Costa Rica │ 5.34 │ 50 │ 2023-02-11 16:43:07 │ │ 7 │ Rama Galloway │ 2009-02-09 │ Italy │ 3.87 │ 24 │ 2023-03-13 16:08:48 │ │ 8 │ Lucas Rodriquez │ 2018-06-19 │ New Zealand │ 2.73 │ 28 │ 2023-03-17 01:45:22 │ │ 9 │ Hunter Donaldson │ 2015-12-18 │ Belgium │ 4.58 │ 43 │ 2023-04-06 03:22:54 │ └───┴──────────────────┴────────────┴─────────────┴───────────────┴────────┴─────────────────────┘ [10 rows x 6 columns]
Now that we have our
SQLDataModel, we can generate some statistics:# Generate statistics df_described = df.describe() # View stats print(df_described)
This will output:
┌────────┬──────────────┬─────────────┬─────────────┬───────────────┬────────┬─────────────────────┐ │ metric │ name │ hire_date │ country │ service_years │ age │ last_update │ ├────────┼──────────────┼─────────────┼─────────────┼───────────────┼────────┼─────────────────────┤ │ count │ 10 │ 10 │ 10 │ 10 │ 10 │ 10 │ │ unique │ 9 │ 10 │ 8 │ 10 │ 9 │ 10 │ │ top │ John Smith │ 2021-07-25 │ New Zealand │ 5.34 │ 56 │ 2023-12-09 18:41:52 │ │ freq │ 2 │ 1 │ 3 │ 1 │ 2 │ 1 │ │ mean │ NaN │ 2014-11-24 │ NaN │ 4.06 │ 40 │ 2023-06-16 19:18:39 │ │ std │ NaN │ 2164.4 days │ NaN │ 0.92 │ 11 │ 117.58 days │ │ min │ Abigail Mays │ 2007-06-06 │ Australia │ 2.73 │ 24 │ 2023-02-11 16:43:07 │ │ p25 │ NaN │ 2009-02-09 │ NaN │ 3.02 │ 28 │ 2023-03-13 16:08:48 │ │ p50 │ NaN │ 2017-08-12 │ NaN │ 4.58 │ 42 │ 2023-05-18 01:29:44 │ │ p75 │ NaN │ 2020-10-11 │ NaN │ 4.65 │ 50 │ 2023-09-27 11:39:08 │ │ max │ Xavier McCoy │ 2021-07-25 │ Ukraine │ 5.34 │ 56 │ 2023-12-09 18:41:52 │ │ dtype │ str │ date │ str │ float │ int │ datetime │ └────────┴──────────────┴─────────────┴─────────────┴───────────────┴────────┴─────────────────────┘ [12 rows x 7 columns]
Specific columns or data types can be excluded from result:
# Set filters to exclude all str dtypes and the 'hire_date' column: df_describe = df.describe(exclude_dtypes=['str'], exclude_columns=['hire_date']) # View statistics print(df_described)
This will output:
┌────────┬───────────────┬────────┬─────────────────────┐ │ metric │ service_years │ age │ last_update │ ├────────┼───────────────┼────────┼─────────────────────┤ │ count │ 10 │ 10 │ 10 │ │ unique │ 10 │ 9 │ 10 │ │ top │ 5.34 │ 56 │ 2023-10-28 05:42:43 │ │ freq │ 1 │ 2 │ 1 │ │ mean │ 4.06 │ 40 │ 2023-08-11 23:18:12 │ │ std │ 0.92 │ 11 │ 73.15 days │ │ min │ 2.73 │ 24 │ 2023-04-07 23:56:06 │ │ p25 │ 3.02 │ 28 │ 2023-06-02 14:36:19 │ │ p50 │ 4.58 │ 42 │ 2023-09-09 19:18:38 │ │ p75 │ 4.65 │ 50 │ 2023-10-09 19:34:55 │ │ max │ 5.34 │ 56 │ 2023-10-28 05:42:43 │ │ dtype │ float │ int │ datetime │ └────────┴───────────────┴────────┴─────────────────────┘ [12 rows x 4 columns]
Important
Generally, do not rely on
SQLDataModelto do statistics, useNumPyor a real scientific computing library instead.
Note
Use
SQLDataModel.infer_dtypes()to cast columns to their apparent data type, or set it manually withSQLDataModel.set_column_dtypes()to convert columns to different data types.Statistics for
dateanddatetimecan be unpredictable if formatting used is inconsistent with conversion to Julian days or if column data type is incorrect.
- Changelog:
- Version 0.6.3 (2024-05-16):
Modified model to output values as string data types and set columns to right-aligned if arguments are not present in
kwargsto retain metric resolution while having numeric alignment.
- Version 0.1.9 (2024-03-19):
New method.
- display_color[source]
The display color to use for string representations of the model. Default is
None, using the standard terminal color.- Type:
ANSIColor
- display_float_precision[source]
The floating point precision to use for string representations of the table, does not affect the actual floating point values stored in the model. Default is 2.
- Type:
int
- display_index[source]
Determines whether the index column is displayed when string representations of the table are generated. Default is True.
- Type:
bool
- drop_column(column: int | str | list, inplace: bool = True) None | SQLDataModel[source]
Drops the specified column(s) from the
SQLDataModel. Values forcolumncan be a single column name or index, or a list of multiple column names or indicies to drop from the model.- Parameters:
column (int | str | list) – The index, name, or list of indices/names of the column(s) to drop.
inplace (bool) – If True, drops the column(s) in-place and updates the model metadata. If False, returns a new
SQLDataModelobject without the dropped column(s) and does not modify the original object. Default is True.
- Returns:
If inplace is True, returns None. Otherwise, returns a new
SQLDataModelobject without the dropped column(s).- Return type:
None | SQLDataModel- Raises:
TypeError – If the column parameter is not of type ‘int’, ‘str’, or a list containing equivalent types.
IndexError – If any provided column index is outside the current column range.
ValueError – If any provided column name is not found in the model’s headers.
Examples:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Gender', 'City'] data = [ ('Alice', 30, 'Female', 'Milwaukee'), ('Sarah', 35, 'Female', 'Houston'), ('Mike', 28, 'Male', 'Atlanta'), ('John', 25, 'Male', 'Boston'), ('Bob', 22, 'Male', 'Chicago'), ] # Create the model df = sdm.SQLDataModel(data,headers) # Drop the 'Gender' column df.drop_column('Gender') # View updated model print(df)
This will output:
┌───────┬──────┬───────────┐ │ Name │ Age │ City │ ├───────┼──────┼───────────┤ │ Alice │ 30 │ Milwaukee │ │ Sarah │ 35 │ Houston │ │ Mike │ 28 │ Atlanta │ │ John │ 25 │ Boston │ │ Bob │ 22 │ Chicago │ └───────┴──────┴───────────┘ [5 rows x 3 columns]
Dropping multiple columns:
# Drop first and last columns by index df.drop_column([0,-1]) # View updated model print(df)
This will output:
┌──────┬────────┐ │ Age │ Gender │ ├──────┼────────┤ │ 30 │ Female │ │ 35 │ Female │ │ 28 │ Male │ │ 25 │ Male │ │ 22 │ Male │ └──────┴────────┘ [5 rows x 2 columns]
Drop columns and return as a new
SQLDataModel:# Drop the multiple columns and return as a new model df = df.drop_column(['Age','Gender'], inplace=False) # View updated model print(df)
This will output:
┌───────┬───────────┐ │ Name │ City │ ├───────┼───────────┤ │ Alice │ Milwaukee │ │ Sarah │ Houston │ │ Mike │ Atlanta │ │ John │ Boston │ │ Bob │ Chicago │ └───────┴───────────┘ [5 rows x 2 columns]
Note
Arguments for
columncan be a singlestrorintorlist[str]containingstrorlist[int]containingintrepresenting column names or column indicies, respectively, but they cannot be combined and provided together. For example, passingcolumns = ['First Name', 3]will raise aTypeErrorexception.The equivalent of this method can also be achieved by simply indexing the required rows and columns using
sdm[rows, column]notation, seeSQLDataModel.__getitem__()for additional details.
- Changelog:
- Version 0.2.3 (2024-03-28):
New method.
- drop_row(row: int | Iterable[int], inplace: bool = True, ignore_index: bool = False) None | SQLDataModel[source]
Drops the specified row(s) indicies from the
SQLDataModel. Values forrowcan be a single row index, or an iterable collection of multiple row indicies to drop.- Parameters:
row (int | Iterable[int]) – The row index or row indicies to drop.
inplace (bool, optional) – If True, drops the rows(s) in-place and updates the model metadata. If False, returns a new
SQLDataModelobject without the dropped row(s). Default is True.ignore_index (bool, optional) – If True, drops the row(s) and ignores the index for the resulting model. Default is False, keeping original indicies in new model.
- Returns:
If in-place is True, returns None. Otherwise, returns a new
SQLDataModelobject without the dropped rows(s).- Return type:
None | SQLDataModel- Raises:
TypeError – If the row parameter is not of type ‘int’ or an iterable collection of type ‘int’ representing the row indicies to drop.
IndexError – If any provided row index is outside the current row range determined by the values at
SQLDataModel.indicies.
Example:
import sqldatamodel as sdm headers = ['Rank', 'Location', 'Population'] data = [(1, "Tokyo, Japan", 37.4), (2, "Delhi, India", 31.0), (3, "Shanghai, China", 27.1), (4, "São Paulo, Brazil", 22.0), (5, "Mexico City, Mexico", 21.8), (6, "Cairo, Egypt", 21.3), (7, "Dhaka, Bangladesh", 21.0), (8, "Mumbai, India", 20.7), (9, "Beijing, China", 20.5), (10,"Osaka, Japan", 19.1)] # Create the sample model df = sdm.SQLDataModel(data, headers) # Drop the last row df.drop_row(-1) # Drop rows based on condition of less than 25 Million population df.drop_row(df['Population'] < 25.0) # View result print(df)
This will output:
┌──────┬─────────────────┬────────────┐ │ Rank │ Location │ Population │ ├──────┼─────────────────┼────────────┤ │ 1 │ Tokyo, Japan │ 37.4 │ │ 2 │ Delhi, India │ 31.0 │ │ 3 │ Shanghai, China │ 27.1 │ └──────┴─────────────────┴────────────┘ [3 rows x 3 columns]
Dropping multiple rows and returning a new model:
# Create a new model using the same sample data df = SQLDataModel(data, headers) # Set row indicies to drop row_indices = range(0, 5) # or [0, 1, 2, 3, 4] # Drop top 5 cities and return as a new model df_new = df.drop_row(row_indices, inplace=False) # View new model print(df_new)
This will output:
┌──────┬───────────────────┬────────────┐ │ Rank │ Location │ Population │ ├──────┼───────────────────┼────────────┤ │ 6 │ Cairo, Egypt │ 21.3 │ │ 7 │ Dhaka, Bangladesh │ 21.0 │ │ 8 │ Mumbai, India │ 20.7 │ │ 9 │ Beijing, China │ 20.5 │ │ 10 │ Osaka, Japan │ 19.1 │ └──────┴───────────────────┴────────────┘ [5 rows x 3 columns]
Important
Rows are referenced by their integer index, and not by their value. This means that row index
0will always refer to the first row in the model, and-1will always refer to the last. This distinction is usually irrelevant when the two are aligned, however this is no longer the case when row(s) are dropped from anywhere except the very last row.Note
Row indicies are retained after being deleted by default, provide
ignore_index=Trueto reset row indicies if required.The equivalent of this method can also be achieved by simply indexing the required rows and columns using
sdm[rows, column]notation, seeSQLDataModel.__getitem__()for additional details.
- Changelog:
- Version 0.7.4 (2024-06-13):
New method.
- dropna(axis: Literal['rows', 'columns'] = 'columns', how: Literal['any', 'all'] = 'all', strictly_null: bool = True, ignore_index: bool = True, inplace: bool = True) None | SQLDataModel[source]
Drop rows or columns with NA values from the SQLDataModel.
- Parameters:
axis (Literal['rows', 'columns'], optional) – The axis along which to drop NA values as either
'rows'or'columns'. Default is'columns', dropping columns with all NA values.how (Literal['any', 'all'], optional) – Determines when to drop NA values,
'any'drops if any NA values are present,'all'drops only if all values are NA. Default is'all', dropping only when all the values are NA along a the specified axis.strictly_null (bool, optional) – If True, only strictly NULL values are considered NA. If False, additional representations of NA (e.g., ‘NaN’, ‘n/a’) are also considered.
ignore_index (bool, optional) – If True, the index column is not considered when dropping rows. Ignored if when axis is set to
'columns'.inplace (bool, optional) – If True, perform the operation in place and modify the SQLDataModel. If False, return a new SQLDataModel with the NA values dropped.
- Returns:
If
inplace=Falsereturns a new SQLDataModel with the NA values dropped. Otherwise, modifies the current SQLDataModel in place and returnsNone.- Return type:
NoneorSQLDataModel- Raises:
ValueError – If
axisis not one of (‘rows’, ‘columns’) orhowis not one of'any'or'all'.DimensionError – If all columns are to be dropped when
axis='columns'resulting in an invalid model schema.
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Gender', 'City'] data = [ ('Sarah', 35, 'Female', 'Houston'), ('Alice', None, 'Female', 'Milwaukee'), ('Mike', None, 'Male', 'Atlanta'), ('John', 25, 'Male', 'Boston'), ('Bob', None, 'Male', 'Chicago'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Drop columns with any NA values in place df.dropna(axis='columns', how='any', inplace=True) # View result print(df)
This will output the updated model after dropping the ‘Age’ column:
┌───────┬────────┬───────────┐ │ Name │ Gender │ City │ ├───────┼────────┼───────────┤ │ Sarah │ Female │ Houston │ │ Alice │ Female │ Milwaukee │ │ Mike │ Male │ Atlanta │ │ John │ Male │ Boston │ │ Bob │ Male │ Chicago │ └───────┴────────┴───────────┘
Rows can also be used as the axis to check against
# Drop rows with any NA values df = df.dropna(axis='rows', how='any') # View result print(df)
This will output the result containing only the rows where no NA values are present:
┌───────┬─────┬────────┬─────────┐ │ Name │ Age │ Gender │ City │ ├───────┼─────┼────────┼─────────┤ │ Sarah │ 35 │ Female │ Houston │ │ John │ 25 │ Male │ Boston │ └───────┴─────┴────────┴─────────┘ [2 rows x 4 columns]
Note
Null or na like is determined by satisfying the SQL NULL value or ‘null like’ values when
strictly_null = Falsein the specified axis.See
SQLDataModel.isna()orSQLDataModel.notna()to filter for rows containing null values.See
SQLDataModel.fillna()to fill all missing or null values in the model.
- Changelog:
- Version 1.0.0 (2024-08-09):
Changed default to
inplace = Trueto align more with similarSQLDataModelmethods.
- Version 0.12.3 (2024-07-11):
New method.
- dtypes[source]
The current model data types mapped to each column in the format of
{'col': 'dtype'}where'dtype'is a string representing the corresponding python type.- Type:
dict[str, str]
- endswith(pat: str | Iterable[str], case: bool = True) set[int][source]
Return the row indices that end with the specified pattern(s) in any column from the model, converting to
str(value)for comparison.- Parameters:
pat (str | Iterable[str]) – The pattern or iterable of patterns to search for within the data.
case (bool, optional) – If True (default), the search is case-sensitive. If False, the search is case-insensitive.
- Raises:
TypeError – If argument for
patis not of type ‘str’ or an iterable of type ‘str’ representing the substring pattern(s).- Returns:
Set of row indices containing values that match the pattern(s).
- Return type:
set[int]
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Sex', 'City'] data = [ ('Mike', 31, 'M', 'Chicago'), ('John', 25, 'M', 'Dayton'), ('Alice', 27, 'F', 'Boston'), ('Sarah', 35, 'F', 'Houston'), ('Bobby', 42, 'M', 'Chicago'), ('Steve', 28, 'F', 'Austin'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter for rows where any column ends with the string 'ston' matching_indices = df['City'].endswith('ston') # Apply filter to model df_suffix = df[matching_indices] # View result print(df_suffix)
This will output the result of applying the filter to the model:
┌───┬───────┬─────┬─────┬─────────┐ │ │ Name │ Age │ Sex │ City │ ├───┼───────┼─────┼─────┼─────────┤ │ 2 │ Alice │ 27 │ F │ Boston │ │ 3 │ Sarah │ 35 │ F │ Houston │ └───┴───────┴─────┴─────┴─────────┘ [2 rows x 4 columns]
Instead of searching a single column, the entire model can be searched:
# Method can also search all columns, and be applied directly df_n = df[df.endswith('N', case=False)] # View result print(df_n)
This will output the result of a case-insensitive search:
┌───┬───────┬─────┬─────┬─────────┐ │ │ Name │ Age │ Sex │ City │ ├───┼───────┼─────┼─────┼─────────┤ │ 1 │ John │ 25 │ M │ Dayton │ │ 2 │ Alice │ 27 │ F │ Boston │ │ 3 │ Sarah │ 35 │ F │ Houston │ │ 5 │ Steve │ 28 │ F │ Austin │ └───┴───────┴─────┴─────┴─────────┘ [4 rows x 4 columns]
This can be used in combination with the setitem syntax to selectively update values as well:
# Create a new column 'Parity' with a default value df['Parity'] = None # Create patterns for even or odd suffixes even_suffixes = [0,2,4,6,8] odd_suffixes = [1,3,5,7,9] # Create the filters for both outcomes even_filter = df.endswith(even_suffixes) odd_filter = df.endswith(odd_suffixes) # Update values based on filters using setitem syntax df[even_filter, 'Parity'] = 'Even' df[odd_filter, 'Parity'] = 'Odd' # View result print(df)
This will output the result of selectively applying updates based on our filters:
┌───┬───────┬─────┬─────┬─────────┬────────┐ │ │ Name │ Age │ Sex │ City │ Parity │ ├───┼───────┼─────┼─────┼─────────┼────────┤ │ 0 │ Mike │ 31 │ M │ Chicago │ Odd │ │ 1 │ John │ 25 │ M │ Dayton │ Odd │ │ 2 │ Alice │ 27 │ F │ Boston │ Odd │ │ 3 │ Sarah │ 35 │ F │ Houston │ Odd │ │ 4 │ Bobby │ 42 │ M │ Chicago │ Even │ │ 5 │ Steve │ 28 │ F │ Austin │ Even │ └───┴───────┴─────┴─────┴─────────┴────────┘ [6 rows x 5 columns]
Note
Any non-string values are converted using
str(value)for comparisons only.See
SQLDataModel.__eq__()for strict equality comparison operations.See
SQLDataModel.__and__()for more details on bitwise and set operations.See
SQLDataModel.__setitem__()for more details on syntaxdf[row, column] = valueand correct usage.See
SQLDataModel.contains()andSQLDataModel.startswith()for additional string methods.
- Changelog:
- Version 0.7.8 (2024-06-18):
New method.
- execute_fetch(sql_query: str, sql_params: tuple = None, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelobject, including display and style properties, after executing the provided SQL query using the currentSQLDataModel. This method is called by other methods which expect results to be returned from their execution.- Parameters:
sql_query (str) – The SQL query to execute with the expectation of rows returned.
sql_params (tuple, optional) – The SQL parameters to provide for parameterized queries.
**kwargs (optional) – Additional keyword args to pass to
SQLDataModelconstructor
- Raises:
SQLProgrammingError – If the provided SQL query is invalid or malformed.
ValueError – If the provided SQL query was valid but returned 0 rows, which is insufficient to return a new model.
- Returns:
A new
SQLDataModelinstance containing the result of the SQL query.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['Column1', 'Column2']) # Create the SQL query to execute query = 'SELECT * FROM sdm WHERE Column1 > 10' # Fetch and save the result to a new instance result_model = df.execute_fetch(query) # Create a parameterized SQL query to execute query = 'SELECT * FROM sdm WHERE Column1 > ? OR Column2 < ?' params = (10, 20) # Provide the SQL and the statement parameters result_parameterized = df.execute_fetch(query, params)
Important
The default table name is
'sdm', you can useSQLDataModel.set_model_name()to modify the name used bySQLDataModel.
Note
Use
SQLDataModel.set_model_name()to modify the table name used by the model, default name set as'sdm'.Display properties such as float precision, index column or table styling are also passed to the new instance when not provided in
kwargs.
- Changelog:
- Version 2.3.0 (2026-01-21):
Modified to allow returning empty result set from execution of
sql_queryreturning model with zero rows using query metadata for column names and dimension.
- Version 0.6.2 (2024-05-15):
Inclusion of
SQLDataModel.table_styleargument in returnedSQLDataModelto inherit all display properties in result.
- Version 0.1.9 (2024-03-19):
New method.
- execute_statement(sql_stmt: str, sql_params: tuple = None, update_row_meta: bool = True) None[source]
Executes an arbitrary SQL query against the current model without the expectation of selection or returned rows.
- Parameters:
sql_stmt (str) – The SQL query to execute.
sql_params (tuple, optional) – The SQL parameters to provide for parameterized queries.
update_row_meta (bool, optional) – Whether the row count metadata should be updated after executing the statement. Default is True, using
SQLDataModel._update_model_metadata()to ensure any schema modifications remain in sync.
- Raises:
SQLProgrammingError – If the SQL execution fails.
- Returns:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('data.csv') # Execute statement without results, modifying column in place df.execute_statement('UPDATE table SET column = value WHERE condition') # Execute a parameterized with statement by providing values df.execute_statement('DELETE FROM table WHERE idx = ? or name = ?', (7,'Bob'))
Note
To execute a query with the expectation of results, see
SQLDataModel.execute_fetch()method.To execute multiple queries within a single transaction, see
SQLDataModel.execute_transaction()method.
- Changelog:
- Version 0.8.0 (2024-06-21):
Added
update_row_metaparameter to speed up transactions that are guaranteed to have no effect on the current modelSQLDataModel.indiciesmetadata. A shallower and computationally cheaper check will still occur to ensureSQLDataModel.header_masterremains in sync.
- Version 0.7.4 (2024-06-13):
Added
sql_paramsparameter to allow parameterized statements similar to other SQL execution methods.
- Version 0.1.9 (2024-03-19):
New method.
- execute_transaction(sql_script: str, update_row_meta: bool = True) None[source]
Executes a prepared SQL script wrapped in a transaction against the current model without the expectation of selection or returned rows.
- Parameters:
sql_script (str) – The SQL script to execute within a transaction.
update_row_meta (bool, optional) – Whether the row count metadata should be updated after executing the transaction. Default is True, using
SQLDataModel._update_model_metadata()to ensure any schema modifications remain in sync.
- Raises:
SQLProgrammingError – If the provided
sql_scriptcannot be executed or the SQL execution fails.- Returns:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('data.csv') # Script to update columns with predicate transaction_script = ''' UPDATE table1 SET column1 = value1 WHERE condition1; UPDATE table2 SET column2 = value2 WHERE condition2; ''' # Execute the script df.execute_transaction(transaction_script)
Note
Use
SQLDataModel.execute_fetch()method if the SQL script is expected to return a selection or result set upon execution.Use
SQLDataModel.execute_statement()method if the SQL script is not expected to return a selection, but parameter bindings and values are needed.Many other methods heavily rely on the
SQLDataModel.execute_transaction()method, therefore modifying it may adversly affect many other methods.
- Changelog:
- Version 0.8.0 (2024-06-21):
Added
update_row_metaparameter to speed up transactions that are guaranteed to have no effect on the current modelSQLDataModel.indiciesmetadata. A shallower and computationally cheaper check will still occur to ensureSQLDataModel.header_masterremains in sync.
- Version 0.1.9 (2024-03-19):
New method.
- fillna(value, strictly_null: bool = False, inplace: bool = True) None | SQLDataModel[source]
Fills missing (na or nan) values in the current
SQLDataModelwith the providedvalueinplace or as a new instance.- Parameters:
value – The scalar value to fill missing values with. Should be of type ‘str’, ‘int’, ‘float’, ‘bytes’, or ‘bool’.
inplace (bool) – If True, modifies the current instance in-place. If False, returns a new instance with missing values filled.
strictly_null (bool) – If True, only strictly null values are filled. If False, values like
'NA','NaN','n/a','na', and whitespace only strings are also filled.
- Raises:
TypeError – If
valueis not a scalar type or is incompatible with SQLite’s type system.- Returns:
When
inplace=Truemodifies model inplace, returningNone, wheninplace=Falsea newSQLDataModelis returned.- Return type:
NoneorSQLDataModel
Example:
import sqldatamodel as sdm # Create sample data data = [('Alice', 25, None), ('Bob', None, 'N/A'), ('Charlie', 'NaN', ' '), ('David', 30, 'NA')] # Create the model df = SQLDataModel(data, headers=['Name', 'Age', 'Status']) # Fill missing values with 0 df_filled = df.fillna(value=0, strictly_null=False, inplace=False) # View filled model print(df_filled)
This will output:
┌───┬─────────┬──────┬────────┐ │ │ Name │ Age │ Status │ ├───┼─────────┼──────┼────────┤ │ 0 │ Alice │ 25 │ 0 │ │ 1 │ Bob │ 0 │ 0 │ │ 2 │ Charlie │ 0 │ 0 │ │ 3 │ David │ 30 │ 0 │ └───┴─────────┴──────┴────────┘ [4 rows x 3 columns]
Note
The method supports filling missing values with various scalar types which are then adapted to the columns set dtype.
The
strictly_nullparameter controls whether additional values like('NA', 'NAN', 'n/a', 'na', '')with last being an empty string, are treated as null.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- freeze_index(column_name: str = None) None[source]
Freeze the current index as a new column, expanding it into the current model. The new column is unaffected by any future changes to the primary index column.
- Parameters:
column_name (str, optional) – The name for the new frozen index column. If not provided, a default name ‘frzn_id’ will be used.
- Raises:
TypeError – If the provided
column_nameis not of type ‘str’.- Returns:
None
Example:
import sqldatamodel as sdm headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01'), ('Sarah', 'West', 39, 0.7, '2023-10-01'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18') ] # Create the model df = sdm.SQLDataModel(data, headers) # Freeze index as new column 'id' df.freeze_index("id") # View model print(df)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┬──────┐ │ │ first │ last │ age │ service │ hire_date │ id │ ├───┼───────┼─────────┼──────┼─────────┼────────────┼──────┤ │ 0 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ 0 │ │ 1 │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ 1 │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ 2 │ │ 3 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ 3 │ │ 4 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ 4 │ └───┴───────┴─────────┴──────┴─────────┴────────────┴──────┘ [5 rows x 6 columns]
Note
Freezing the index will assign the current
SQLDataModel.sql_idxfor each row as a new column, leaving the current index in place.To modify the actual
SQLDataModel.sql_idxvalue, use theSQLDataModel.reset_index()method instead.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_csv(csv_source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiter: str = ',', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelgenerated from the provided CSV source, which can be either a file path or a raw delimited string.- Parameters:
csv_source (str) – The path to the CSV file or a raw delimited string.
infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.
encoding (str, optional) – The encoding used to decode the CSV source if it is a file. Default is ‘Latin1’.
delimiter (str, optional) – The delimiter to use when parsing CSV source. Default is
,.quotechar (str, optional) – The character used for quoting fields. Default is
".headers (List[str], optional) – List of column headers. If None, the first row of the CSV source is assumed to contain headers.
**kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided CSV source.
- Return type:
SQLDataModel- Raises:
ValueError – If no delimited data is found in
csv_sourceor if parsing with delimiter does not yield valid tabular data.Exception – If an error occurs while attempting to read from or process the provided CSV source.
Examples:
From CSV File
import sqldatamodel as sdm # CSV file path or raw CSV string csv_source = "/path/to/data.csv" # Create the model using the CSV file, providing custom headers df = sdm.from_csv(csv_source, headers=['ID', 'Name', 'Value'])
From CSV Literal
import sqldatamodel as sdm # CSV data data = ''' A, B, C 1a, 1b, 1c 2a, 2b, 2c 3a, 3b, 3c ''' # Create the model df = sdm.from_csv(data) # View result print(df)
This will output:
┌──────┬──────┬──────┐ │ A │ B │ C │ ├──────┼──────┼──────┤ │ 1a │ 1b │ 1c │ │ 2a │ 2b │ 2c │ │ 3a │ 3b │ 3c │ └──────┴──────┴──────┘ [3 rows x 3 columns]
Note
If
csv_sourceis delimited by characters other than those specified, useSQLDataModel.from_delimited()and provide delimiter todelimiters.If
headersare provided, the first row parsed from source will be the first row in the table and not discarded.The
infer_typesargument can be used to infer the appropriate data type for each column:When
infer_types = True, a random subset of the data will be used to infer the correct type and cast values accordinglyWhen
infer_types = False, values from the first row only will be used to assign types, almost always ‘str’ when reading from CSV.
- Changelog:
- Version 0.4.0 (2024-04-23):
Modifed to only parse CSV files and removed all delimiter sniffing with introduction of new method
SQLDataModel.from_delimited()to handle other delimiters.Renamed
delimitersparameter todelimiterwith,set as new default to reflect revised focus on CSV files only.
- classmethod from_data(data: Any = None, **kwargs) SQLDataModel[source]
Convenience method to infer the source of
dataand return the appropriate constructor method to generate a newSQLDataModelinstance.- Parameters:
data (Any, required) – The input data from which to create the SQLDataModel object.
**kwargs – Additional keyword arguments to be passed to the constructor method, see init method for arguments.
- Constructor methods are called according to the input type:
dict: If all values are python datatypes, passed asdtypesto constructor, otherwise asdatatoSQLDataModel.from_dict().list: If single dimension, passed asheadersto constructor, otherwise asdatacontaining list of lists.tuple: Same as with list, if single dimension passed asheaders, otherwise asdatacontaining tuple of lists.numpy.ndarray: passed toSQLDataModel.from_numpy()as array data.pandas.DataFrame: passed toSQLDataModel.from_pandas()as dataframe data.polars.DataFrame: passed toSQLDataModel.from_polars()as dataframe data.str: If starts with ‘http’, passed toSQLDataModel.from_html()as url, otherwise:'.csv': passed toSQLDataModel.from_csv()as csv source data.'.html': passed toSQLDataModel.from_html()as html source data.'.json': passed toSQLDataModel.from_json()as json source data.'.md': passed toSQLDataModel.from_markdown()as markdown source data.'.parquet': passed toSQLDataModel.from_parquet()as parquet source data.'.pkl': passed toSQLDataModel.from_pickle()as pickle source data.'.sdm': passed toSQLDataModel.from_pickle()as pickle source data.'.tex': passed toSQLDataModel.from_latex()as latex source data.'.tsv': passed toSQLDataModel.from_csv()as csv source data.'.txt': passed toSQLDataModel.from_text()as text source data.'.xlsx': passed toSQLDataModel.from_excel()as excel source data.'.xml': passed toSQLDataModel.from_xml()as xml source data.
- Returns:
The SQLDataModel object created from the provided data.
- Return type:
SQLDataModel- Raises:
TypeError – If the type of
datais not supported.ValueError – If the file extension is not found, unsupported, or if the SQL extension is not supported.
Exception – If an OS related error occurs during file read operations if
datais a filepath.
Example:
import sqldatamodel as sdm # Create SQLDataModel from a CSV file df_csv = sdm.from_data("data.csv", headers=['ID', 'Name', 'Value']) # Create SQLDataModel from a dictionary df_dict = sdm.from_data({"ID": int, "Name": str, "Value": float}) # Create SQLDataModel from a list of tuples df_list = sdm.from_data([(1, 'Alice', 100.0), (2, 'Bob', 200.0)], headers=['ID', 'Name', 'Value']) # Create SQLDataModel from raw string literal delimited_literal = ''' A, B, C 1, 2, 3 4, 5, 6 7, 8, 9 ''' # Create the model by having correct constructor inferred df = sdm.from_data(delimited_literal) # View output print(df)
This will output:
┌────┬────┬────┐ │ A │ B │ C │ ├────┼────┼────┤ │ 1 │ 2 │ 3 │ │ 4 │ 5 │ 6 │ │ 7 │ 8 │ 9 │ └────┴────┴────┘ [3 rows x 3 columns]
Note
This method attempts to infer the correct method to call based on
dataargument, if one cannot be inferred an exception is raised.For data type specific implementation or examples, see related method for appropriate data type.
- classmethod from_delimited(source: str, infer_types: bool = True, encoding: str = 'Latin1', delimiters: str = ', \t;|:', quotechar: str = '"', headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelgenerated from the provided delimited source, which can be either a file path or a raw delimited string.- Parameters:
source (str) – The path to the delimited file or a raw delimited string.
infer_types (bool, optional) – Infer column types based on random subset of data. Default is True, when False, all columns are str type.
encoding (str, optional) – The encoding used to decode the source if it is a file. Default is
'Latin1'.delimiters (str, optional) – Possible delimiters. Default is
\s,\t,;,|,:or,(space, tab, semicolon, pipe, colon or comma).quotechar (str, optional) – The character used for quoting fields. Default is
".headers (list[str], optional) – List of column headers. If None, the first row of the delimited source is assumed to be the header row.
**kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided CSV source.
- Return type:
SQLDataModel- Raises:
ValueError – If no delimiter is found in
sourceor if parsing with delimiter does not yield valid tabular data.Exception – If an error occurs while attempting to read from or process the provided CSV source.
Example:
From Delimited Literal
import sqldatamodel as sdm # Space delimited literal source_data = ''' Name Age Height Beth 27 172.4 Kate 28 162.0 John 30 175.3 Will 35 185.8''' # Create the model df = sdm.from_delimited(source_data) # View output print(df)
This will output:
┌──────┬─────┬─────────┐ │ Name │ Age │ Height │ ├──────┼─────┼─────────┤ │ Beth │ 27 │ 172.40 │ │ Kate │ 28 │ 162.00 │ │ John │ 30 │ 175.30 │ │ Will │ 35 │ 185.80 │ └──────┴─────┴─────────┘ [4 rows x 3 columns]
From Delimited File
import sqldatamodel as sdm # Tab separated file tsv_file = 'persons.tsv' # Create the model df = sdm.from_delimited(tsv_file)
Note
Use
SQLDataModel.from_csv()if delimiter in source is already known and available as this method requires more compute to determine a plausible delimiter.Use
SQLDataModel.from_text()if data is not delimited but is a string representation such as an ASCII table or the output from anotherSQLDataModelinstance.If file is delimited by delimiters other than the default targets
\s,\t,;,|,:or,(space, tab, semicolon, pipe, colon or comma) make sure they are provided as single character values todelimiters.
- Changelog:
- Version 0.4.0 (2024-04-23):
New method.
- classmethod from_dict(data: dict | list, **kwargs) SQLDataModel[source]
Create a new
SQLDataModelinstance from the provided dictionary.- Parameters:
data (dict) – The dictionary or list of dictionaries to convert to SQLDataModel. If keys are of type int, they will be used as row indexes; otherwise, keys will be used as headers.
**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided dictionary.
- Return type:
SQLDataModel- Raises:
TypeError – If the provided dictionary values are not of type ‘list’, ‘tuple’, or ‘dict’.
ValueError – If the provided data appears to be a list of dicts but is empty.
Example:
import sqldatamodel as sdm # Sample data with column orientation data = { 'Name': ['Beth', 'John', 'Alice', 'Travis'], 'Height': [172.4, 175.3, 162.0, 185.8], 'Age': [27, 30, 28, 35] } # Create the model df = sdm.from_dict(data) # View it print(df)
This will output:
┌────────┬─────────┬─────┐ │ Name │ Height │ Age │ ├────────┼─────────┼─────┤ │ Beth │ 172.40 │ 27 │ │ John │ 175.30 │ 30 │ │ Alice │ 162.00 │ 28 │ │ Travis │ 185.80 │ 35 │ └────────┴─────────┴─────┘ [4 rows x 3 columns]
We can also create a model using a dictionary with row orientation:
import sqldatamodel as sdm # Sample data with row orientation data = { 0: ['Mercury', 0.38] ,1: ['Venus', 0.91] ,2: ['Earth', 1.00] ,3: ['Mars', 0.38] } # Create the model with custom headers df = sdm.from_dict(data, headers=['Planet', 'Gravity']) # View output print(df)
This will output the model created using row-wise dictionary data:
┌─────────┬─────────┐ │ Planet │ Gravity │ ├─────────┼─────────┤ │ Mercury │ 0.38 │ │ Venus │ 0.91 │ │ Earth │ 1.00 │ │ Mars │ 0.38 │ └─────────┴─────────┘ [4 rows x 2 columns]
Note
If data orientation suggests JSON like structure, then
SQLDataModel.from_json()will attempt to construct the model.Dictionaries in list like orientation can also be used with structures similar to JSON objects.
The method determines the structure of the SQLDataModel based on the format of the provided dictionary.
If the keys are integers, they are used as row indexes; otherwise, keys are used as headers.
See
SQLDataModel.to_dict()for converting existing instances ofSQLDataModelto dictionaries.
- Changelog:
- Version 0.6.3 (2024-05-16):
Modified to try parsing input data as JSON if initial inspection does not signify row or column orientation.
- Version 0.1.5 (2023-11-24):
New method.
- classmethod from_excel(filename: str, worksheet: int | str = 0, min_row: int | None = None, max_row: int | None = None, min_col: int | None = None, max_col: int | None = None, headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelinstance from the specified Excel file.- Parameters:
filename (str) – The file path to the Excel file, e.g.,
filename = 'titanic.xlsx'.worksheet (int | str, optional) – The index or name of the worksheet to read from. Defaults to 0, indicating the first worksheet.
min_row (int | None, optional) – The minimum row number to start reading data from. Defaults to None, indicating the first row.
max_row (int | None, optional) – Maximum row index (1-based) to import. Defaults to None, indicating all rows are read.
min_col (int | None, optional) – Minimum column index (1-based) to import. Defaults to None, indicating the first column.
max_col (int | None, optional) – Maximum column index (1-based) to import. Defaults to None, indicating all the columns are read.
headers (List[str], optional) – The column headers for the data. Default is None, using the first row of the Excel sheet as headers.
**kwargs – Additional keyword arguments to pass to the
SQLDataModelconstructor.
- Raises:
ModuleNotFoundError – If the required package
openpyxlis not installed as determined byoptionals._has_xlflag.TypeError – If the
filenameargument is not of type ‘str’ representing a valid Excel file path.Exception – If an error occurs during Excel read and write operations related to openpyxl processing.
- Returns:
A new instance of
SQLDataModelcreated from the Excel file.- Return type:
SQLDataModel
Examples:
We’ll use this Excel file,
data.xlsx, as the source for the below examples:┌───────┬─────┬────────┬───────────┐ │ A │ B │ C │ D │ ┌───┼───────┼─────┼────────┼───────────┤ │ 1 │ Name │ Age │ Gender │ City │ │ 2 │ John │ 25 │ Male │ Boston │ │ 3 │ Alice │ 30 │ Female │ Milwaukee │ │ 4 │ Bob │ 22 │ Male │ Chicago │ │ 5 │ Sarah │ 35 │ Female │ Houston │ │ 6 │ Mike │ 28 │ Male │ Atlanta │ └───┴───────┴─────┴────────┴───────────┘ [ Sheet1 ]Example 1: Load Excel file with default parameters
import sqldatamodel as sdm # Create the model using the default parameters df = sdm.from_excel('data.xlsx') # View imported data print(df)
This will output all of the data starting from ‘A1’:
┌───────┬──────┬────────┬───────────┐ │ Name │ Age │ Gender │ City │ ├───────┼──────┼────────┼───────────┤ │ John │ 25 │ Male │ Boston │ │ Alice │ 30 │ Female │ Milwaukee │ │ Bob │ 22 │ Male │ Chicago │ │ Sarah │ 35 │ Female │ Houston │ │ Mike │ 28 │ Male │ Atlanta │ └───────┴──────┴────────┴───────────┘ [5 rows x 4 columns]
Example 2: Load Excel file from specific worksheet
import sqldatamodel as sdm # Create the model from 'Sheet2' df = sdm.from_excel('data.xlsx', worksheet='Sheet2') # View imported data print(df)
This will output the contents of ‘Sheet2’:
┌────────┬───────┐ │ Gender │ count │ ├────────┼───────┤ │ Male │ 3 │ │ Female │ 2 │ └────────┴───────┘ [2 rows x 2 columns]
Example 3: Load Excel file with custom headers starting from different row
import sqldatamodel as sdm # Use our own headers instead of the Excel ones new_cols = ['Col A', 'Col B', 'Col C', 'Col D'] # Create the model starting from the 2nd row to ignore the original headers df = sdm.from_excel('data.xlsx', min_row=2, headers=new_cols) # View the data print(df)
This will output the data with our renamed headers:
┌───────┬───────┬────────┬───────────┐ │ Col A │ Col B │ Col C │ Col D │ ├───────┼───────┼────────┼───────────┤ │ John │ 25 │ Male │ Boston │ │ Alice │ 30 │ Female │ Milwaukee │ │ Bob │ 22 │ Male │ Chicago │ │ Sarah │ 35 │ Female │ Houston │ │ Mike │ 28 │ Male │ Atlanta │ └───────┴───────┴────────┴───────────┘ [5 rows x 4 columns]
Example 4: Load Excel file with specific subset of columns
import sqldatamodel as sdm # Create the model using the middle two columns only df = sdm.from_excel('data.xlsx', min_col=2, max_col=3) # View the data print(df)
This will output only the middle two columns:
┌──────┬────────┐ │ Age │ Gender │ ├──────┼────────┤ │ 25 │ Male │ │ 30 │ Female │ │ 22 │ Male │ │ 35 │ Female │ │ 28 │ Male │ └──────┴────────┘ [5 rows x 2 columns]
Note
This method entirely relies on
openpyxl, see their amazing documentation for further information on Excel file handling in python.If custom
headersare provided using the defaultmin_row, then the original headers, if present, will be duplicated.All indicies for
min_row,max_row,min_colandmax_colare 1-based instead of 0-based, again seeopenpyxlfor more details.See related
SQLDataModel.to_excel()for exporting an existingSQLDataModelto Excel.
- Changelog:
- Version 0.2.2 (2024-03-26):
New method.
- classmethod from_html(html_source: str, encoding: str = 'utf-8', table_identifier: int | str = 1, infer_types: bool = True, **kwargs) SQLDataModel[source]
Parses HTML table element from one of three possible sources: web page at url, local file at path, raw HTML string literal. If
table_identifieris not specified, the first <table> element successfully parsed is returned, otherwise iftable_identifieris astr, the parser will return the corresponding ‘id’ or ‘name’ HTML attribute that matches the identifier specified. Iftable_identifieris anint, the parser will return the table matched as a sequential index after parsing all <table> elements from the top of the page down, starting at ‘1’, the first table found. By default, the first <table> element found is returned iftable_identifieris not specified.- Parameters:
html_source (str) – The HTML source, which can be a URL, a valid path to an HTML file, or a raw HTML string. If starts with ‘http’, the argument is considered a url and the table will be parsed from returned the web request. If is a valid file path, the argument is considered a local file and the table will be parsed from its html. If is not a valid url or path, the argument is considered a raw HTML string and the table will be parsed directly from the input.
encoding (str) – The encoding to use for reading HTML when
html_sourceis considered a valid url or file path (default is ‘utf-8’).table_identifier (int | str) – An identifier to specify which table to parse if there are multiple tables in the HTML source. Default is 1, returning the first table element found.
infer_types (bool, optional) – If column data types should be inferred in the return model. Default is True, meaning column types will be inferred otherwise are returned as ‘str’ types. If is
int, identifier is treated as the indexed location of the <table> element on the page from top to bottom starting from zero and will return the corresponding position when encountered. If isstr, identifier is treated as a target HTML ‘id’ or ‘name’ attribute to search for and will return the first case-insensitive match when encountered.**kwargs – Additional keyword arguments to pass when using
urllib.request.urlopento fetch HTML from a URL.
- Returns:
A new SQLDataModel instance containing the data from the parsed HTML table.
- Return type:
SQLDataModel- Raises:
TypeError – If
html_sourceis not of typestrrepresenting a possible url, filepath or raw HTML stream.HTTPError – Raised from
urllibwhenhtml_sourceis considered a url and an HTTP exception occurs.URLError – Raised from
urllibwhenhtml_sourceis considered a url and a URL exception occurs.ValueError – If no <table> elements are found or if the targeted
table_identifieris not found.OSError – Related exceptions that may be raised when
html_sourceis considered a file path.
Examples:
From Website URL
import sqldatamodel as sdm # From URL url = 'https://en.wikipedia.org/wiki/1998_FIFA_World_Cup' # Lets get the 95th table from the 1998 World Cup df = sdm.from_html(url, table_identifier=95) # View result: print(df)
This will output:
┌────┬─────────────┬────┬────┬────┬────┬────┬────┬────┬─────┬──────┐ │ R │ Team │ G │ P │ W │ D │ L │ GF │ GA │ GD │ Pts. │ ├────┼─────────────┼────┼────┼────┼────┼────┼────┼────┼─────┼──────┤ │ 1 │ France │ C │ 7 │ 6 │ 1 │ 0 │ 15 │ 2 │ +13 │ 19 │ │ 2 │ Brazil │ A │ 7 │ 4 │ 1 │ 2 │ 14 │ 10 │ +4 │ 13 │ │ 3 │ Croatia │ H │ 7 │ 5 │ 0 │ 2 │ 11 │ 5 │ +6 │ 15 │ │ 4 │ Netherlands │ E │ 7 │ 3 │ 3 │ 1 │ 13 │ 7 │ +6 │ 12 │ │ 5 │ Italy │ B │ 5 │ 3 │ 2 │ 0 │ 8 │ 3 │ +5 │ 11 │ │ 6 │ Argentina │ H │ 5 │ 3 │ 1 │ 1 │ 10 │ 4 │ +6 │ 10 │ │ 7 │ Germany │ F │ 5 │ 3 │ 1 │ 1 │ 8 │ 6 │ +2 │ 10 │ │ 8 │ Denmark │ C │ 5 │ 2 │ 1 │ 2 │ 9 │ 7 │ +2 │ 7 │ └────┴─────────────┴────┴────┴────┴────┴────┴────┴────┴─────┴──────┘ [8 rows x 11 columns]
From Local File
import sqldatamodel as sdm # From HTML file df = sdm.from_html('path/to/file.html') # View output print(df)
This will output:
┌─────────────┬────────┬──────┐ │ Team │ Points │ Rank │ ├─────────────┼────────┼──────┤ │ Brazil │ 63.7 │ 1 │ │ England │ 50.7 │ 2 │ │ Spain │ 50.0 │ 3 │ │ Germany [a] │ 49.3 │ 4 │ │ Mexico │ 47.3 │ 5 │ │ France │ 46.0 │ 6 │ │ Italy │ 44.3 │ 7 │ │ Argentina │ 44.0 │ 8 │ └─────────────┴────────┴──────┘ [8 rows x 3 columns]
From Raw HTML
import sqldatamodel as sdm # Raw HTML raw_html = '''<table id="find-me"> <tr> <th>Col 1</th> <th>Col 2</th> </tr> <tr> <td>A</td> <td>1</td> </tr> <tr> <td>B</td> <td>2</td> </tr> <tr> <td>C</td> <td>3</td> </tr> </table>''' # Create the model and search for id attribute df = sdm.from_html(raw_html, table_identifier="find-me") # View output print(df)
This will output:
┌───┬───────┬───────┐ │ │ Col 1 │ Col 2 │ ├───┼───────┼───────┤ │ 1 │ B │ 2 │ │ 2 │ C │ 3 │ └───┴───────┴───────┘ [3 rows x 2 columns]
Note
**kwargspassed to method are used inurllib.request.urlopenifhtml_sourceis being considered as a web url.**kwargspassed to method are used inopenifhtml_sourceis being considered as a filepath.The largest row size encountered will be used as the
column_countfor the returnedSQLDataModel, rows will be padded withNoneif less.See
utils.generate_html_table_chunks()for initial source chunking before content fed toSQLDataModel.HTMLParser.
- Changelog:
- Version 0.9.0 (2024-06-26):
Modified
table_identifierdefault value to 1, changing from zero-based to one-based indexing for referencing target table in source to align with similar extraction methods throughout package.
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_json(json_source: str | list | dict, encoding: str = 'utf-8', **kwargs) SQLDataModel[source]
Creates a new
SQLDataModelinstance from JSON file path or JSON-like source, flattening if required.- Parameters:
json_source (str | list | dict) – The JSON source. If a string, it can represent a file path or a JSON-like object.
encoding (str) – The encoding to use when reading from a file. Defaults to ‘utf-8’.
**kwargs – Additional keyword arguments to pass to the
SQLDataModelconstructor.
- Returns:
A new SQLDataModel instance created from the JSON source.
- Return type:
SQLDataModel- Raises:
TypeError – If the
json_sourceargument is not of type ‘str’, ‘list’, or ‘dict’.OSError – If related exception occurs when trying to open and read from
json_sourceas file path.
Examples:
From JSON String Literal
import sqldatamodel as sdm # Sample JSON string json_data = '''[{ "id": 1, "color": "red", "value": "#f00" }, { "id": 2, "color": "green", "value": "#0f0" }, { "id": 3, "color": "blue", "value": "#00f" }]''' # Create the model df = sdm.from_json(json_data) # View result print(df)
This will output:
┌──────┬───────┬───────┐ │ id │ color │ value │ ├──────┼───────┼───────┤ │ 1 │ red │ #f00 │ │ 2 │ green │ #0f0 │ │ 3 │ blue │ #00f │ └──────┴───────┴───────┘ [3 rows x 3 columns]
From JSON-like Object
import sqldatamodel as sdm # JSON-like sample json_data = [{ "alpha": "A", "value": "1" }, { "alpha": "B", "value": "2" }, { "alpha": "C", "value": "3" }] # Create the model df = sdm.from_json(json_data) # Output print(df)
This will output:
┌───────┬───────┐ │ alpha │ value │ ├───────┼───────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ └───────┴───────┘ [3 rows x 2 columns]
From JSON file
import sqldatamodel as sdm # JSON file path json_data = 'data/json-sample.json' # Create the model df = sdm.from_json(json_data, encoding='latin-1') # View output print(df)
This will output:
┌──────┬────────┬───────┬─────────┐ │ id │ color │ value │ notes │ ├──────┼────────┼───────┼─────────┤ │ 1 │ red │ #f00 │ primary │ │ 2 │ green │ #0f0 │ │ │ 3 │ blue │ #00f │ primary │ │ 4 │ cyan │ #0ff │ │ │ 5 │ yellow │ #ff0 │ │ │ 5 │ black │ #000 │ │ └──────┴────────┴───────┴─────────┘ [6 rows x 4 columns]
Note
If
json_sourceis deeply-nested it will be flattened according to the staticmethodutils.flatten_json()If
json_sourceis a JSON-like string object that is not an array, it will be wrapped according as an array.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_latex(latex_source: str, table_identifier: int = 1, encoding: str = 'utf-8', **kwargs) SQLDataModel[source]
Creates a new
SQLDataModelinstance from the provided LaTeX file or raw literal.- Parameters:
latex_source (str) – The LaTeX source containing one or more LaTeX tables. If
latex_sourceis a valid system filepath, source will be treated as a.texfile and parsed. Iflatex_sourceis not a valid filepath, source will be parsed as raw LaTeX literal.table_identifier (int, optional) – The index position of the LaTeX table to extract. Default is 1.
encoding (str, optional) – The file encoding to use if source is a LaTex filepath. Default is ‘utf-8’;.
**kwargs – Additional keyword arguments to be passed to the
SQLDataModelconstructor.
- Returns:
The
SQLDataModelinstance created from the parsed LaTeX table.- Return type:
SQLDataModel- Raises:
TypeError – If the
latex_sourceargument is not of type ‘str’, or if thetable_identifierargument is not of type ‘int’.ValueError – If the
table_identifierargument is less than 1, or if no tables are found in the LaTeX source.IndexError – If the
table_identifieris greater than the number of tables found in the LaTeX source.
- Table Indicies:
In the last example,
sdmwill contain the data from the second table found in the LaTeX content.Tables are indexed starting from index 1 at the top of the LaTeX content, incremented as they are found.
LaTeX parsing stops after the table specified at
table_identifieris found without parsing the remaining content.
Examples:
From LaTeX literal
import sqldatamodel as sdm # Raw LaTeX literal latex_content = ''' \begin{tabular}{|l|r|r|} \hline {Name} & {Age} & {Height} \\ \hline John & 30 & 175.30 \\ Alice & 28 & 162.00 \\ Michael & 35 & 185.80 \\ \hline \end{tabular} ''' # Create the model from the LaTeX df = sdm.from_latex(latex_content) # View result print(df)
This will output:
┌─────────┬──────┬─────────┐ │ Name │ Age │ Height │ ├─────────┼──────┼─────────┤ │ John │ 30 │ 175.30 │ │ Alice │ 28 │ 162.00 │ │ Michael │ 35 │ 185.80 │ └─────────┴──────┴─────────┘ [3 rows x 3 columns]
From LaTeX file
import sqldatamodel as sdm # Load LaTeX content from file latex_file = 'path/to/latex/file.tex' # Create the model using the path df = sdm.from_latex(latex_file)
Specifying table identifier
import sqldatamodel as sdm # Raw LaTeX literal with multiple tables latex_content = ''' %% LaTeX with a Table \begin{tabular}{|l|l|} \hline {Header A} & {Header B} \\ \hline Value A1 & Value B1 \\ Value A2 & Value B2 \\ \hline \end{tabular} %% Then another Table \begin{tabular}{|l|l|} \hline {Header X} & {Header Y} \\ \hline Value X1 & Value Y1 \\ Value X2 & Value Y2 \\ \hline \end{tabular} ''' # Create the model from the 2nd table df = sdm.from_latex(latex_content, table_identifier=2) # View output print(df)
This will output:
┌──────────┬──────────┐ │ Header X │ Header Y │ ├──────────┼──────────┤ │ Value X1 │ Value Y1 │ │ Value X2 │ Value Y2 │ └──────────┴──────────┘ [2 rows x 2 columns]
Note
LaTeX tables are identified based on the presence of tabular environments:
\begin{tabular}...\end{tabular}.The
table_identifierspecifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.The provided
kwargsare passed to theSQLDataModelconstructor for additional parameters to the instance returned.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_markdown(markdown_source: str, table_identifier: int = 1, **kwargs) SQLDataModel[source]
Creates a new
SQLDataModelinstance from the provided Markdown source file or raw content.If
markdown_sourceis a valid system path, the markdown file will be parsed. Otherwise, the provided string will be parsed as raw markdown.- Parameters:
markdown_source (str) – The Markdown source file path or raw content.
table_identifier (int, optional) – The index position of the markdown table to extract. Default is 1.
**kwargs – Additional keyword arguments to be passed to the
SQLDataModelconstructor.
- Raises:
TypeError – If the
markdown_sourceargument is not of type ‘str’, or if thetable_identifierargument is not of type ‘int’.ValueError – If the
table_identifierargument is less than 1, or if no tables are found in the markdown source.IndexError – If the
table_identifieris greater than the number of tables found in the markdown source.
- Returns:
The SQLDataModel instance created from the parsed markdown table.
- Return type:
SQLDataModel
- Table indicies:
In the last example,
sdmwill contain the data from the second table found in the markdown content.Tables are indexed starting from index 1 at the top of the markdown content, incremented as they are found.
Markdown parsing stops after the table specified at
table_identifieris found without parsing the remaining content.
Examples:
From Markdown Literal
import sqldatamodel as sdm # Raw markdown literal markdown_content = ''' | Item | Price | # In stock | |---------------|-------|------------| | Juicy Apples | 1.99 | 37 | | Bananas | 1.29 | 52 | | Pineapple | 3.15 | 14 | ''' # Create the model from the markdown df = sdm.from_markdown(markdown_content) # View result print(df)
This will output:
┌──────────────┬───────┬────────────┐ │ Item │ Price │ # In stock │ ├──────────────┼───────┼────────────┤ │ Juicy Apples │ 1.99 │ 37 │ │ Bananas │ 1.29 │ 52 │ │ Pineapple │ 3.15 │ 14 │ └──────────────┴───────┴────────────┘ [3 rows x 3 columns]
From Markdown File
import sqldatamodel as sdm # Load markdown content from file markdown_file_path = 'path/to/markdown_file.md' # Create the model using the path df = sdm.from_markdown(markdown_file_path)
Specifying Table Identifier
import sqldatamodel as sdm # Raw markdown literal with multiple tables markdown_content = ''' ### Markdown with a Table | Header A | Header B | |----------|----------| | Value A1 | Value B1 | | Value A2 | Value B2 | ### Then another Table | Header X | Header Y | |----------|----------| | Value X1 | Value Y1 | | Value X2 | Value Y2 | ''' # Create the model from the 2nd table df = sdm.from_markdown(markdown_content, table_identifier=2) # View output print(df)
This will output:
┌──────────┬──────────┐ │ Header X │ Header Y │ ├──────────┼──────────┤ │ Value X1 │ Value Y1 │ │ Value X2 │ Value Y2 │ └──────────┴──────────┘ [2 rows x 2 columns]
Note
Markdown tables are identified based on the presence of pipe characters
|defining table cells.The
table_identifierspecifies which table to extract when multiple tables are present, beginning at position ‘1’ from the top of the source.Escaped pipe characters
\|within the markdown are replaced with the HTML entity reference|for proper parsing.The provided
kwargsare passed to theSQLDataModelconstructor for additional parameters to the instance returned.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_numpy(array, headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a
SQLDataModelobject created from the provided numpyarray.- Parameters:
array (numpy.ndarray) – The numpy array to convert to a SQLDataModel.
headers (list of str, optional) – The list of headers to use for the SQLDataModel. If None, no headers will be used, and the data will be treated as an n-dimensional array. Default is None.
**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the numpy array.
- Return type:
SQLDataModel- Raises:
ModuleNotFoundError – If the required package
numpyis not found.TypeError – If
arrayargument is not of typenumpy.ndarray.DimensionError – If
array.ndim != 2representing a (row, column) tabular array.
Example:
import numpy as np import sqldatamodel as sdm # Sample array arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Create the model with custom headers df = sdm.from_numpy(arr, headers=['Col A', 'Col B', 'Col C]) # View output print(df)
This will output:
┌───────┬───────┬───────┐ │ Col A │ Col B │ Col C │ ├───────┼───────┼───────┤ │ 1 │ 2 │ 3 │ │ 4 │ 5 │ 6 │ │ 7 │ 8 │ 9 │ └───────┴───────┴───────┘ [3 rows x 3 columns]
Note
Numpy array must have ‘2’ dimensions, the first representing the rows, and the second the columns.
If no headers are provided, default headers will be generated as ‘col_N’ where N represents the column integer index.
- Changelog:
- Version 0.1.3 (2023-10-15):
New method.
- classmethod from_pandas(df, headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a
SQLDataModelobject created from the provideddfrepresenting a PandasDataFrameobject. Note thatpandasmust be installed in order to use this method.- Parameters:
df (pandas.DataFrame) – The pandas DataFrame to convert to a SQLDataModel.
headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the
dfobject.**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the pandas DataFrame.
- Return type:
SQLDataModel- Raises:
ModuleNotFoundError – If the required package
pandasis not found.TypeError – If
dfargument is not of typepandas.DataFrame.
Example:
import pandas as pd import sqldatamodel as sdm # Create a pandas DataFrame df_pd = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) # Create the model df_sdm = sdm.from_pandas(df_pd)
Note
If
headersare not provided, the existing pandas columns will be used as the newSQLDataModelheaders.
- Changelog:
- Version 0.1.3 (2023-10-15):
New method.
- classmethod from_parquet(filename: str, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelinstance from the specified parquet file.- Parameters:
filename (str) – The file path to the parquet file, e.g.,
filename = 'user/data/titanic.parquet'.**kwargs – Additional keyword arguments to pass to the pyarrow
read_tablefunction, e.g.,filters = [('Name','=','Alice')].
- Returns:
A new instance of
SQLDataModelcreated from the parquet file.- Return type:
SQLDataModel- Raises:
ModuleNotFoundError – If the required package
pyarrowis not installed as determined byoptionals._has_paflag.TypeError – If the
filenameargument is not of type ‘str’ representing a valid parquet file path.FileNotFoundError – If the specified parquet
filenameis not found.Exception – If any unexpected exception occurs during the file or parquet reading process.
Example:
import sqldatamodel as sdm # Sample parquet file pq_file = "titanic.parquet" # Create the model df = sdm.from_parquet(pq_file) # View column counts print(df.count())
This will output:
┌────┬─────────────┬──────┬────────┬───────┬───────┐ │ │ column │ na │ unique │ count │ total │ ├────┼─────────────┼──────┼────────┼───────┼───────┤ │ 0 │ PassengerId │ 0 │ 891 │ 891 │ 891 │ │ 1 │ Survived │ 0 │ 2 │ 891 │ 891 │ │ 2 │ Pclass │ 0 │ 3 │ 891 │ 891 │ │ 3 │ Name │ 0 │ 891 │ 891 │ 891 │ │ 4 │ Sex │ 0 │ 2 │ 891 │ 891 │ │ 5 │ Age │ 177 │ 88 │ 714 │ 891 │ │ 6 │ SibSp │ 0 │ 7 │ 891 │ 891 │ │ 7 │ Parch │ 0 │ 7 │ 891 │ 891 │ │ 8 │ Ticket │ 0 │ 681 │ 891 │ 891 │ │ 9 │ Fare │ 0 │ 248 │ 891 │ 891 │ │ 10 │ Cabin │ 687 │ 147 │ 204 │ 891 │ │ 11 │ Embarked │ 2 │ 3 │ 889 │ 891 │ └────┴─────────────┴──────┴────────┴───────┴───────┘ [12 rows x 5 columns]
Note
The pyarrow package is required to use this method as well as the
SQLDataModel.to_parquet()method.Once the file is read into pyarrow.parquet, the
to_pydict()method is used to pass the data to this package’sSQLDataModel.from_dict()method.Titanic parquet data used in example available at https://www.kaggle.com/code/taruntiwarihp/titanic-dataset
- classmethod from_pickle(filename: str = None, **kwargs) SQLDataModel[source]
Returns the
SQLDataModelobject from the providedfilename. IfNone, the current directory will be scanned for the defaultSQLDataModel.to_pickle()format.- Parameters:
filename (str, optional) – The name of the pickle file to load. If None, the current directory will be scanned for the default filename. Default is None.
**kwargs – Additional arguments to be passed to the SQLDataModel constructor, these will override the properties loaded from
filename.
- Returns:
The SQLDataModel object created from the loaded pickle file.
- Return type:
SQLDataModel- Raises:
TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.
FileNotFoundError – If the provided filename could not be found or does not exist.
Example:
import sqldatamodel as sdm headers = ['Name','Age','Sex'] data = [('Alice', 20, 'F'), ('Bob', 25, 'M'), ('Gerald', 30, 'M')] # Create the model with sample data df = sdm.SQLDataModel(data=data, headers=headers) # Filepath pkl_file = 'people.sdm' # Save the model df.to_pickle(filename=pkl_file) # Load it back from file df = sdm.from_pickle(filename=pkl_file)
Note
All data, headers, data types and display properties will be saved when pickling.
Any additional
kwargsprovided will override those saved in the pickled model.
- classmethod from_polars(df, headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a
SQLDataModelobject created from the provideddfrepresenting a PolarsDataFrameobject. Note thatpolarsmust be installed in order to use this method.- Parameters:
df (polars.DataFrame) – The Polars DataFrame to convert to a SQLDataModel.
headers (list[str], optional) – The list of headers to use for the SQLDataModel. Default is None, using the columns from the
dfobject.**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the Polars DataFrame.
- Return type:
SQLDataModel- Raises:
ModuleNotFoundError – If the required package
polarsis not found.TypeError – If
dfargument is not of typepolars.DataFrame.
Example:
import polars as pl import sqldatamodel as sdm # Sample data data = { 'Name': ['Beth', 'John', 'Alice', 'Travis'], 'Age': [27, 30, 28, 35], 'Height': [172.4, 175.3, 162.0, 185.8] } # Create the polars DataFrame df_pl = pl.DataFrame(data) # Create a SQLDataModel object df_sdm = sdm.from_polars(df_pl) # View result print(df_sdm)
This will output a
SQLDataModelconstructed from the Polarsdf_pl:┌────────┬─────┬─────────┐ │ Name │ Age │ Height │ ├────────┼─────┼─────────┤ │ Beth │ 27 │ 172.40 │ │ John │ 30 │ 175.30 │ │ Alice │ 28 │ 162.00 │ │ Travis │ 35 │ 185.80 │ └────────┴─────┴─────────┘ [4 rows x 3 columns]
Note
If
headersare not provided, the columns from the provided DataFrame’s columns will be used as the newSQLDataModelheaders.Polars uses different data types than those used by
SQLDataModel, seeSQLDataModel.set_column_dtypes()for specific casting rules.See related
SQLDataModel.to_polars()for the inverse method of converting aSQLDataModelinto a PolarsDataFrameobject.
- Changelog:
- Version 0.3.8 (2024-04-12):
New method.
- classmethod from_pyarrow(table, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelinstance from the provided Apache Arrow object.- Parameters:
table (pyarrow.lib.Table) – Apache Arrow object from which to construct a new
SQLDataModelobject.**kwargs – Additional keyword arguments to pass to the SQLDataModel constructor.
- Raises:
ModuleNotFoundError – If the required package
pyarrowis not installed.TypeError – If the provided table argument is not of type ‘pyarrow.lib.Table’.
- Returns:
A new SQLDataModel instance representing the data in the provided Apache Arrow object.
- Return type:
SQLDataModel
Example:
import pyarrow as pa import sqldatamodel as sdm # Sample data data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Grade': [3.8, 3.9, 3.2], } # Create PyArrow table from data table = pa.Table.from_pydict(data) # Create model from PyArrow table df = sdm.from_pyarrow(table)
This will output:
┌─────────┬──────┬───────┐ │ Name │ Age │ Grade │ ├─────────┼──────┼───────┤ │ Alice │ 25 │ 3.80 │ │ Bob │ 30 │ 3.90 │ │ Charlie │ 35 │ 3.20 │ └─────────┴──────┴───────┘ [3 rows x 3 columns]
Note
To convert an existing
SQLDataModelinstance to Apache Arrow format, seeSQLDataModel.to_pyarrow().This method is only for in-memory Apache Arrow table objects, for reading and writing parquet see
SQLDataModel.from_parquet().
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.2.3 (2024-03-28):
New method.
- classmethod from_shape(shape: tuple[int, int], fill: Any = None, headers: list[str] = None, dtype: Literal['bytes', 'date', 'datetime', 'float', 'int', 'str'] = None, **kwargs) SQLDataModel[source]
Returns a SQLDataModel from shape
(N rows, M columns)as a convenience method to quickly build a model through an iterative approach. By default, no particular data type is assigned given the flexibility ofsqlite3, however one can be inferred by providing an initialfillvalue or explicitly by providing thedtypeargument.- Parameters:
shape (tuple[int, int]) – The shape to initialize the SQLDataModel with as
(M, N)whereMis the number of rows andNis the number of columns.fill (Any, optional) – The scalar fill value to populate the new SQLDataModel with. Default is None, using SQL null values or deriving from
dypeif provided.headers (list[str], optional) – The headers to use for the model. Default is None, incrementing headers
0, 1, ..., NwhereNis the number of columns.dtype (str, optional) – A valid python or SQL datatype to initialize the n-dimensional model with. Default is None, using the SQL text type.
**kwargs – Additional keyword arguments to pass to the
SQLDataModelconstructor.
- Raises:
TypeError – If
MorNare not of type ‘int’ representing a valid shape to initialize a SQLDataModel with.ValueError – If
MorNare not positive integer values representing valid nonzero row and column dimensions.ValueError – If
dtypeis not a valid python or SQL convertible datatype to initialize the model with.
- Returns:
Instance with the specified number of rows and columns, initialized with by
dtypefill values or withNonevalues (default).- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create a 3x3 model filled by 'X' df = sdm.from_shape((3,3), fill='X') # View it print(df)
This will output a 3x3 grid of ‘X’ characters:
┌───┬─────┬─────┬─────┐ │ │ 0 │ 1 │ 2 │ ├───┼─────┼─────┼─────┤ │ 0 │ X │ X │ X │ │ 1 │ X │ X │ X │ │ 2 │ X │ X │ X │ └───┴─────┴─────┴─────┘ [3 rows x 3 columns]
We can iteratively build the model from the shape dimensions:
import sqldatamodel as sdm # Define shape shape = (6,6) # Initialize the multiplcation table with integer dtypes mult_table = sdm.from_shape(shape=shape, dtype='int') # Construct the table values for x in range(shape[0]): for y in range(shape[1]): mult_table[x, y] = x * y # View the multiplcation table print(mult_table)
This will output our 6x6 multiplication table:
┌───┬─────┬─────┬─────┬─────┬─────┬─────┐ │ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ ├───┼─────┼─────┼─────┼─────┼─────┼─────┤ │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ │ 1 │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ │ 2 │ 0 │ 2 │ 4 │ 6 │ 8 │ 10 │ │ 3 │ 0 │ 3 │ 6 │ 9 │ 12 │ 15 │ │ 4 │ 0 │ 4 │ 8 │ 12 │ 16 │ 20 │ │ 5 │ 0 │ 5 │ 10 │ 15 │ 20 │ 25 │ └───┴─────┴─────┴─────┴─────┴─────┴─────┘ [6 rows x 6 columns]
Note
If both
fillanddtypeare provided, the data type will be derived fromtype(fill)overriding or ignoring the specifieddtype.If only
dtypeis provided, sensible default initialization fill values will be used to populate the model such as 0 or 0.0 for numeric and empty string or null for others.For those data types not natively implemented by
sqlite3such asdateanddatetime, today’s date and now’s datetime will be used respectively for initialization values.
- Changelog:
- Version 0.5.2 (2024-05-13):
Added
shapeparameter in lieu of separaten_rowsandn_colsarguments.Added
fillparameter to populate resulting SQLDataModel with values to override type-specific initialization defaults.Added
headersparameter to explicitly set column names when creating the SQLDataModel.Added
**kwargsparameter to align more closely with usage patterns of other model initializing constructor methods.
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_sql(sql: str, con: Connection | Any, dtypes: dict = None, **kwargs) SQLDataModel[source]
Create a
SQLDataModelobject by executing the provided SQL query using the specified SQL connection. If a single word is provided as thesql, the method wraps it and executes a select all treating the text as the target table.- Supported Connection APIs:
SQLite using
sqlite3or url with format'file:///path/to/database.db'PostgreSQL using
psycopg2or url with format'postgresql://user:pass@hostname:port/db'SQL Server ODBC using
pyodbcor url with format'mssql://user:pass@hostname:port/db'Oracle using
cx_Oracleor url with format'oracle://user:pass@hostname:port/db'Teradata using
teradatasqlor url with format'teradata://user:pass@hostname:port/db'
- Parameters:
sql (str) – The SQL query to execute and use to create the SQLDataModel.
con (sqlite3.Connection | Any) – The database connection object or url, supported connection APIs are
sqlite3,psycopg2,pyodbc,cx_Oracle,teradatasql.dtypes (dict, optional) – A dictionary of the format
'column': 'python dtype'to assign to values. Default is None, mapping types from source connection.**kwargs – Additional arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the executed SQL query.
- Return type:
SQLDataModel- Raises:
TypeError – If dtypes argument is provided and is not of type
dictrepresenting python data types to assign to values.SQLProgrammingError – If the provided SQL connection is not opened or valid, or the SQL query is invalid or malformed.
ModuleNotFoundError – If
conis provided as a connection url and the specified scheme driver module is not found.DimensionError – If the provided SQL query returns no data.
Examples:
From SQL Table
import sqldatamodel as sdm # Single word parameter df = sdm.from_sql("table_name", sqlite3.Connection) # Equilavent query executed df = sdm.from_sql("select * from table_name", sqlite3.Connection)
From SQLite Database
import sqlite3 import sqldatamodel as sdm # Create connection object sqlite_db_conn = sqlite3.connect('./database/users.db') # Basic usage with a select query df = sdm.from_sql("SELECT * FROM my_table", sqlite_db_conn) # When a single word is provided, it is treated as a table name for a select all query df_table = df.from_sql("my_table", sqlite_db_conn)
From PostgreSQL Database
import psycopg2 import sqldatamodel as sdm # Create connection object pg_db_conn = psycopg2.connect('dbname=users user=postgres password=postgres') # Basic usage with a select query df = sdm.from_sql("SELECT * FROM my_table", pg_db_conn) # When a single word is provided, it is treated as a table name for a select all query df_table = df.from_sql("my_table", pg_db_conn)
From SQL Server Databse
import pyodbc import sqldatamodel as sdm # Create connection object con = pyodbc.connect("DRIVER={SQL Server};SERVER=host;DATABASE=db;UID=user;PWD=pw;") # Basic usage with a select query df = sdm.from_sql("SELECT * FROM my_table", con) # When a single word is provided, it is treated as a table name for a select all query df_table = df.from_sql("my_table", con)
Note
When
conis provided as a string a connection will be attempted usingutils._create_connection()if the path does not exist, otherwise asqlite3local connection will be attempted.When
conis provided as an object a connection is assumed to be open and valid, if a cursor cannot be created from the object an exception will be raised.Unsupported connection object will output a
SQLDataModelWarningadvising unstable or undefined behaviour.The
dtypes, if provided, are only applied tosqlite3connection objects as remaining supported connections implement SQL to python adapters.See related
SQLDataModel.to_sql()for writing to SQL database connections.See utility methods
utils._parse_connection_url()andutils._create_connection()for implementation on creating database connections from urls.
- Changelog:
- Version 2.3.0 (2026-01-21):
Modified to allow returning an empty result set from the execution of
sql, constructing an empty model using the headers returned from the cursor.
- Version 0.9.1 (2024-06-27):
Modified handling of
conparameter to allow database connection url to also be provided as'scheme://user:pass@host:port/db'
- Version 0.8.2 (2024-06-24):
Modified handling of
conparameter to allow providing SQLite database filepath directly as string to instantiate connection.
- Version 0.3.0 (2024-03-31):
Renamed
sql_queryparameter tosqlfor consistency with similar method arguments.
- classmethod from_text(text_source: str, table_identifier: int = 1, encoding: str = 'utf-8', headers: list[str] = None, **kwargs) SQLDataModel[source]
Returns a new
SQLDataModelgenerated from the providedtext_source, either as a file if the path exists, or from a raw string literal if the path does not exist.- Parameters:
text_source (str) – The path to the tabular data file or a raw string literal containing tabular data.
table_identifier (int, optional) – The index position of the target table within the text source. Default is 1.
encoding (str, optional) – The encoding used to decode the text source if it is a file. Default is ‘utf-8’.
headers (list, optional) – The headers to use for the provided data. Default is to use the first row.
**kwargs – Additional keyword arguments to be passed to the SQLDataModel constructor.
- Returns:
The SQLDataModel object created from the provided tabular data.
- Return type:
SQLDataModel- Raises:
TypeError – If
text_sourceis not a string ortable_identifieris not an integer.ValueError – If no tabular data is found in
text_source, if parsing fails to extract valid tabular data, or if the providedtable_identifieris out of range.IndexError – If the provided
table_identifierexceeds the number of tables found intext_source.Exception – If an error occurs while attempting to read from or process the provided
text_source.
Example:
import sqldatamodel as sdm # Text source containing tabular data text_source = "/path/to/tabular_data.txt" # Create the model using the text source df = sdm.from_text(text_source, table_identifier=2)
Note
This method is made for parsing
SQLDataModelformatted text, such as the kind generated withprint(df)or the output created by the inverse methodSQLDataModel.to_text()For parsing other delimited tabular data, this method calls the related
SQLDataModel.from_csv()method, which parses tabular data constructed with common delimiters.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- classmethod from_xml(xml_source: str, orient: Literal['rows', 'columns'] = 'rows', row_tag: str = 'row', column_tag: str = 'column', value_tag: str = 'value', root_tag: str | None = None, encoding: str = 'utf-8', infer_types: bool = True, **kwargs) SQLDataModel[source]
Creates a new
SQLDataModelinstance from an XML source.- Parameters:
xml_source (str) – File path, URL, or raw XML string.
orient (Literal['rows','columns']) – Orientation of XML data where ‘rows’ treats as row as a record and ‘columns’ treats each column as a list of values.
row_tag (str) – Row tag name when
orient='rows'.column_tag (str) – Column tag name when
orient='columns'.value_tag (str) – Value tag name inside column elements.
root_tag (str | None) – Optional root element selector.
encoding (str) – Encoding for file or URL input.
infer_types (bool) – Whether to infer column types.
- Returns:
The SQLDataModel object created from the provided XML data.
- Return type:
SQLDataModel- Raises:
TypeError – If
xml_sourceis not a string type.ValueError – If value for
orientis not one of ‘rows’ or ‘columns’ representing the data orientation.
Example:
import sqldatamodel as sdm # XML data as string literal xml_literal = ''' <data> <row> <Name>Alice</Name> <Age>25</Age> <Grade>3.8</Grade> </row> <row> <Name>Bob</Name> <Age>30</Age> <Grade>3.9</Grade> </row> <row> <Name>Charlie</Name> <Age>35</Age> <Grade>3.2</Grade> </row> </data>''' # Create the model from the XML data df = sdm.SQLDataModel.from_xml(xml_literal) # View the resulting model print(df)
This will output:
┌───┬─────────┬─────┬───────┐ │ │ Name │ Age │ Grade │ ├───┼─────────┼─────┼───────┤ │ 0 │ Alice │ 25 │ 3.80 │ │ 1 │ Bob │ 30 │ 3.90 │ │ 2 │ Charlie │ 35 │ 3.20 │ └───┴─────────┴─────┴───────┘ [3 rows x 3 columns]
Alternatively, column names can be parsed from
nameattributes of<col>tags:import sqldatamodel as sdm # Sample XML str literal xml = ''' <data> <row> <col name="1">Alice</col> <col name="2">30</col> </row> <row> <col name="1">Bob</col> <col name="2">25</col> </row> </data> ''' df = sdm.SQLDataModel.from_xml(xml) print(df.headers) # [1, 2] print(df.to_json(index=False)) # [{"1": "Alice", "2": 30}, {"1": "Bob", "2": 25}]
Note
The headers will be parsed from either a direct self-named
<COLUMN_NAME>tag, or from a generic<col>tag’snameattribute if serialized accordingly.
- Changelog:
- Version 2.3.1 (2026-01-22):
New method.
- generate_apply_function_stub() str[source]
Generates a function template using the current
SQLDataModelto format function arguments for theSQLDataModel.apply_function_to_column()method.- Returns:
A string representing the function template.
- Return type:
str
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('data.csv') # Create the stub stub = df.generate_apply_function_stub() # View it print(stub)
This will output:
def func(user_name:str, user_age:int, user_salaray:float): # apply logic and return value returnContaining all the required inputs and column names needed to generate a compatible function to apply to the model and can be copy pasted into existing code.
Note
This method is to meant as a general informative tool or for debugging assistance if needed
See
SQLDataModel.apply()method for usage and implementation of functions in SQLDataModel usingsqlite3
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- get_column_alignment() str[source]
Returns the current
column_alignmentproperty value,dynamicby default.- Returns:
The current value of the
column_alignmentproperty.- Return type:
str
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get the current alignment value alignment = df.get_column_alignment() # Outputs 'dynamic' print(alignment)
Note
Use
SQLDataModel.set_column_alignment()to modify column alignment.
- get_column_dtypes(columns: str | int | list = None, dtypes: Literal['python', 'sql'] = 'python') dict[source]
Get the data types of specified columns as either Python or SQL datatypes as a
dictin the format of{'column': 'dtype'}.- Parameters:
columns (str | int | list) – The column or columns for which to retrieve data types. Defaults to all columns.
dtypes (Literal["python", "sql"]) – The format in which to retrieve data types. Defaults to “python”.
- Raises:
TypeError – If
columnsis not of typestr,int, orlist.IndexError – If
columnsis of typeintand the index is outside the valid range.ValueError – If a specified column in
columnsis not found in the current dataset. UseSQLDataModel.get_headers()to view valid columns.
- Returns:
A dictionary mapping column names to their data types.
- Return type:
dict
Example:
import sqldatamodel as sdm # Sample data headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01'), ('Sarah', 'West', 39, 0.7, '2023-10-01'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18') ] # Create the model df = sdm.SQLDataModel(data, headers) # Get all column python dtypes df_dtypes = df.get_column_dtypes() # View dict items for col, dtype in df_dtypes.items(): print(f"{col}: {dtype}")
This will output:
first: str last: str age: int service: float hire_date: date
Get SQL data types as well:
# Get specific column sql dtypes df_dtypes = df.get_column_dtypes(columns=['first','age','service'], dtypes="sql") # View dict items for col, dtype in df_dtypes.items(): print(f"{col}: {dtype}")
This will output:
first: TEXT age: INTEGER service: REAL
Note
SQLDataModel index column is not included, only columns specified in the
SQLDataModel.headersattribute are in scope.Only the dtypes are returned, any primary key references are removed to ensure compatability with external calls.
Python datatypes are returned in lower case, while SQL dtypes are returned in upper case to reflect convention.
See
SQLDataModel.dtypesfor direct mapping from column to Python data type returned as{'col': 'dtype'}.
- Changelog:
- Version 0.8.0 (2024-06-21):
Modified to allow
columnsargument to be provided as an any valid reference including integer indexes or an iterable sequence of indexes to reflect similar flexibility surrounding column referencing across package.
- Version 0.1.9 (2024-03-19):
New method.
- get_display_float_precision() int[source]
Retrieves the current float display precision used exclusively for representing the values of real numbers in the
reprmethod for theSQLDataModel. Default value is set to 4 decimal places of precision.- Returns:
The current float display precision.
- Return type:
int
Note
The float display precision is the number of decimal places to include when displaying real numbers in the string representation of the
SQLDataModel.This value is utilized in the
reprmethod to control the precision of real number values.The method does not affect the actual value of float dtypes in the underlying
SQLDataModel
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- get_display_index() bool[source]
Returns the current value set at
SQLDataModel.display_index, which determines whether or not the index is displayed in theSQLDataModelrepresentation.- Returns:
The current value of the
display_indexproperty.- Return type:
bool
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get the current value for displaying the index display_index = df.get_display_index() # Output: True print(display_index)
Note
Use
SQLDataModel.set_display_index()to modify this property and toggle index display visibility.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- get_display_max_rows() int | None[source]
Retrieves the current value at
SQLDataModel.display_max_rows, which determines the maximum rows displayed for theSQLDataModel.- Returns:
The current value set at
SQLDataModel.display_max_rows.- Return type:
intorNone
Example:
import sqldatamodel as sdm # Create model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get current value display_max_rows = df.get_display_max_rows() # By default rows will be limited by current terminal height print(display_max_rows) # None
Note
This does not affect the actual number of rows in the model, only the maximum displayed.
Use
SQLDataModel.set_display_max_rows()to explicitly set a max row limit instead of using terminal height.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- get_headers() list[str][source]
Returns the current
SQLDataModelheaders.- Returns:
A list of strings representing the headers.
- Return type:
list
Example:
import sqldatamodel as sdm # Create model df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Get current model headers headers = df.get_headers() # Display values print(headers) # outputs: ['First Name', 'Last Name', 'Salary']
- Changelog:
- Version 0.1.5 (2023-11-24):
New method.
- get_indicies() tuple[source]
Returns the current valid row indicies for the
SQLDataModelinstance.- Returns:
A tuple of the current values for
SQLDataModel.sql_idxin ascending order.- Return type:
tuple
Example:
import sqldatamodel as sdm headers = ['Name', 'Age', 'Height'] data = [('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8)] # Create the model df = sdm.SQLDataModel(data, headers) # Get current valid indicies valid_indicies = df.get_indicies() # View results print(valid_indicies)
This will output:
(0, 1, 2)
- Notes
Primary use is to confirm valid model indexing when starting index != 0 or filtering changes minimum/maximum indexes.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- get_max_column_width() int[source]
Returns the current
max_column_widthproperty value.- Returns:
The current value of the
max_column_widthproperty.- Return type:
int
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get the current max column width value max_width = df.get_max_column_width() # Output print(max_width) # 32
- get_min_column_width() int[source]
Returns the current
min_column_widthproperty value.- Returns:
The current value of the
min_column_widthproperty.- Return type:
int
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Get and save the current value min_width = df.get_min_column_width() # Output print(min_width) # 6
- get_model_name() str[source]
Returns the
SQLDataModeltable name currently being used by the model as an alias for any SQL queries executed by the user and internally.- Returns:
The current
SQLDataModeltable name set by value of attributeSQLDataModel.model_name.- Return type:
str
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['Column1', 'Column2']) # Get the current name model_name = df.get_model_name() # View it print(f'The model is currently using the table name: {model_name}')
Note
Use
SQLDataModel.set_model_name()to modify the table name used internally to represent theSQLDataModelinstance.
- Changelog:
- Version 0.1.5 (2023-11-24):
New method.
- get_shape() tuple[int, int][source]
Returns the current shape of the
SQLDataModelas a tuple of(rows x columns).- Returns:
A tuple representing the current dimensions of rows and columns in the
SQLDataModel.- Return type:
tuple[int, int]
Example:
import sqldatamodel as sdm # Create the model sdm = SQLDataModel([[1,2,3], [4,5,6], [7,8,9]]) # Get the current shape shape = df.get_shape() # View it print("shape:", shape)
This will output:
shape: (3, 3)
The shape can also be seen when printing the model:
import sqldatamodel as sdm # Create the model df = sdm.SQLDataModel([[1,2,3], [4,5,6], [7,8,9]]) # View it and the shape print(sdm, "<-- shape is also visible here")
This will output:
┌───┬───────┬───────┬───────┐ │ │ col_0 │ col_1 │ col_2 │ ├───┼───────┼───────┼───────┤ │ 0 │ 1 │ 2 │ 3 │ │ 1 │ 4 │ 5 │ 6 │ │ 2 │ 7 │ 8 │ 9 │ └───┴───────┴───────┴───────┘ [3 rows x 3 columns] <-- shape is also visible here
Note
If an empty model is initialized, the
SQLDataModel.row_countwill be 0 until the first row is inserted.Using the
SQLDataModel.__getitem__()syntax ofdf[row, col]returns a new model instance with the corresponding shape.
- Changelog:
- Version 0.3.6 (2024-04-09):
Returns the new
SQLDataModel.shapedirectly, making this method redundant.
- classmethod get_supported_sql_connections() tuple[source]
Returns the currently tested DB API 2.0 dialects for use with
SQLDataModel.from_sql()method.- Returns:
A tuple of supported DB API 2.0 dialects.
- Return type:
tuple
Example:
import sqldatamodel as sdm # Get supported dialects supported_dialects = sdmSQLDataModel.get_supported_sql_connections() # View details print(supported_dialects) # Outputs supported_dialects = ('sqlite3', 'psycopg2', 'pyodbc', 'cx_oracle', 'teradatasql')
- Changelog:
- Version 0.1.5 (2023-11-24):
New method.
- group_by(columns: str | list[str], order_by_count: bool = True) SQLDataModel[source]
Returns a new
SQLDataModelafter performing a group by operation on specified columns.- Parameters:
columns (str, list, tuple) – Columns to group by. Accepts either individual strings or a list/tuple of strings.
order_by_count (bool, optional) – If True (default), orders the result by count. If False, orders by the specified columns.
- Raises:
TypeError – If the columns argument is not of type str, list, or tuple.
ValueError – If any specified column does not exist in the current model.
SQLProgrammingError – If any specified columns or aggregate keywords are invalid or incompatible with the current model.
- Returns:
A new
SQLDataModelinstance containing the result of the group by operation.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm headers = ['first', 'last', 'age', 'service', 'hire_date', 'gender'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01', 'Male'), ('Sarah', 'West', 39, 0.7, '2023-10-01', 'Female'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27', 'Male'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06', 'Male'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18', 'Female') ] # Create the model df = sdm.SQLDataModel(data, headers, display_float_precision=2, display_index=True) # Group by 'gender' column df_gender = df.group_by("gender") # View model print(df_gender)
This will output:
┌───┬────────┬───────┐ │ │ gender │ count │ ├───┼────────┼───────┤ │ 0 │ Male │ 3 │ │ 1 │ Female │ 2 │ └───┴────────┴───────┘ [2 rows x 2 columns]
Multiple columns can also be used to group by:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('data.csv') # Group by multiple columns df.group_by(["country", "state", "city"])
Note
Use
order_by_count=Falseto change ordering from count to column arguments provided.See
SQLDataModel.describe()for generating descriptive statistics by column data type.See
SQLDataModel.pivot()for creating a pivot table using categorization and aggregate functions.
- Changelog:
- Version 0.8.0 (2024-06-21):
Modified to allow
columnsto be referenced by their integer index as well as directly to allow broader inputs and reflect similar access patterns across package.
- head(n_rows: int = 5) SQLDataModel[source]
Returns the first
n_rowsof the currentSQLDataModel.- Parameters:
n_rows (int, optional) – Number of rows to return. Defaults to 5.
- Raises:
TypeError – If
n_rowsargument is not of type ‘int’ representing the number of rows to return from the head of the model.- Returns:
A new
SQLDataModelinstance containing the specified number of rows.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Countries data available for sample dataset url = 'https://developers.google.com/public-data/docs/canonical/countries_csv' # Create the model df = sdm.from_html(url) # Get head of model df_head = df.head() # View it print(df_head)
This will grab the top 5 rows by default:
┌───┬─────────┬──────────┬───────────┬────────────────┐ │ │ country │ latitude │ longitude │ name │ ├───┼─────────┼──────────┼───────────┼────────────────┤ │ 0 │ AF │ 33.9391 │ 67.7100 │ Afghanistan │ │ 1 │ AL │ 41.1533 │ 20.1683 │ Albania │ │ 2 │ DZ │ 28.0339 │ 1.6596 │ Algeria │ │ 3 │ AS │ -14.2710 │ -170.1322 │ American Samoa │ │ 4 │ AD │ 42.5462 │ 1.6016 │ Andorra │ └───┴─────────┴──────────┴───────────┴────────────────┘ [5 rows x 4 columns]
Note
See related
SQLDataModel.tail()for the opposite, grabbing the bottomn_rowsfrom the current model.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- header_master[source]
Maps the current model’s column metadata in the format of
'column_name': ('sql_dtype', 'py_dtype', is_regular_column, 'default_alignment'), updated bySQLDataModel._update_model_metadata().- Type:
dict[str, tuple]
- headers[source]
The current column names of the model. If not provided, default column names will be used.
- Type:
list[str]
- hstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel[source]
Horizontally stacks one or more
SQLDataModelobjects to the current model.- Parameters:
other (SQLDataModel or sequence of) – The SQLDataModel objects to horizontally stack.
inplace (bool, optional) – If True, performs the horizontal stacking in-place, modifying the current model. Defaults to False, returning a new
SQLDataModel.
- Returns:
The horizontally stacked SQLDataModel instance when inplace is False.
- Return type:
SQLDataModel- Raises:
ValueError – If no additional SQLDataModels are provided for horizontal stacking.
TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.
SQLProgrammingError – If an error occurs when updating the model values in place.
Example:
import sqldatamodel as sdm # Create models A and B df_a = sdm.SQLDataModel([('A', 'B'), ('1', '2')], headers=['A1', 'A2']) df_b = sdm.SQLDataModel([('C', 'D'), ('3', '4')], headers=['B1', 'B2']) # Horizontally stack B onto A df_ab = df_a.hstack(df_b) # View stacked model print(df_ab)
This will output the result of stacking B onto A, using each model’s headers and dtypes:
┌─────┬─────┬─────┬─────┐ │ A1 │ A2 │ B1 │ B2 │ ├─────┼─────┼─────┼─────┤ │ A │ B │ C │ D │ │ 1 │ 2 │ 3 │ 4 │ └─────┴─────┴─────┴─────┘ [2 rows x 4 columns]
Multiple models can be stacked simultaneously, here we stack a total of 3 models:
# Create a third model C df_c = sdm.SQLDataModel([('E', 'F'), ('5', '6')], headers=['C1', 'C2']) # Horizontally stack three models df_abc = df_a.hstack([df_b, df_c]) # View stacked result print(df_abc)
This will output the result of stacking C and B onto A:
┌─────┬─────┬─────┬─────┬─────┬─────┐ │ A1 │ A2 │ B1 │ B2 │ C1 │ C2 │ ├─────┼─────┼─────┼─────┼─────┼─────┤ │ A │ B │ C │ D │ E │ F │ │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ └─────┴─────┴─────┴─────┴─────┴─────┘ [2 rows x 6 columns]
Note
Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use
SQLDataModel.merge()for strict SQL joins instead of hstack.Headers and data types are inherited from all the models being stacked, this requires aliasing duplicate column names if present, see
utils.alias_duplicates()for aliasing rules.Use
setitemsyntax such assdm['New Column'] = valuesto create new columns directly into the current model instead of stacking or seeSQLDataModel.add_column_with_values()for convenience method accomplishing the same.See
SQLDataModel.vstack()for vertical stacking.
- Changelog:
- Version 0.3.4 (2024-04-05):
New method.
- infer_dtypes(n_samples: int = 16, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') None[source]
Infer and set data types for columns based on a random subset of
n_samplesfrom the current model. Thedateutillibrary is required for complex date and datetime parsing, if the module is not found thendate_formatanddatetime_formatwill be used for dates and datetimes respectively.- Parameters:
n_samples (int) – The number of random samples to use for data type inference. Default set to 16.
date_format (str) – The format string to use for parsing date values if
dateutillibrary is not found. Default is ‘%Y-%m-%d’.datetime_format (str) – The format string to use for parsing datetime values if
dateutillibrary is not found. Default is ‘%Y-%m-%d %H:%M:%S’.
- Raises:
TypeError – If argument for
n_samplesis not of typeintor if argument fordate_formatordatetime_formatis not of type ‘str’.ValueError – If the current model contains zero columns from which to infer types from.
DimensionError – If the current model contains insufficient rows to sample from.
- Returns:
Inferred column types are updated and
Noneis returned.- Return type:
None
Example:
import sqldatamodel as sdm # Sample data of ``str`` containing probable datatypes headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', '27', '1.22', '2023-02-01'), ('Sarah', 'West', '39', '0.7', '2023-10-01'), ('Mike', 'Harlin', '36', '3.9', '2020-08-27'), ('Pat', 'Douglas', '42', '11.5', '2015-11-06'), ('Kelly', 'Lee', '32', '8.0', '2016-09-18') ] # Create the model df = sdm.SQLDataModel(data, headers) # Get current column dtypes for reference dtypes_before = df.get_column_dtypes() # Infer and set data types based on 10 random samples df.infer_dtypes(n_samples=10) # View updated model print(df)
This will output data with dtypes correctly aligned:
┌───────┬─────────┬──────┬─────────┬────────────┐ │ first │ last │ age │ service │ hire_date │ ├───────┼─────────┼──────┼─────────┼────────────┤ │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ └───────┴─────────┴──────┴─────────┴────────────┘ [5 rows x 5 columns]
Use
SQLDataModel.get_column_dtypes()orSQLDataModel.dtypesto view current types:# Get new column types to confirm dtypes_after = df.get_column_dtypes() # View updated dtypes for col in df.headers: print(f"{col:<10} {dtypes_before[col]} -> {dtypes_after[col]}")
This will output:
first: str -> str last: str -> str age: str -> int service: str -> float hire_date: str -> date
Note
If a single
strinstance is found in the samples, the corresponding column dtype will remain asstrto avoid data loss.Co-occurences of
int&float, ordate&datetimewill favor the superset dtype afterinfer_thresholdis met, sofloatanddatetimerespectively.If a single
datetimeinstance is found amongst a higher proportion ofdatedtypes,datetimewill be used according to second rule.If a single
floatinstance is found amongst a higher proportion ofintdtypes,floatwill be used according to second rule.Ties between dtypes are broken according to current type <
str<float<int<datetime<date<bytes<NoneThis method calls
SQLDataModel.set_column_dtypes()once the column dtypes have been inferred if they differ from the current dtype.See
SQLDataModel.infer_str_type()for type determination process.See
utils.infer_types_from_data()for type voting scheme used for inference.
- Changelog:
- Version 0.2.0 (2024-03-19):
Increased sampling size for inference from
n_samples=10ton_samples=16for better resolution.
- Version 0.1.9 (2024-03-19):
New method.
- insert_row(index: int, values: list | tuple, on_conflict: Literal['replace', 'ignore'] = 'replace') None[source]
Inserts a new row into the
SQLDataModelat the specifiedindexwith the providedvalues.- Parameters:
index (int) – The position at which to insert the row.
values (list or tuple) – The values to be inserted into the row.
on_conflict (Literal['replace', 'ignore'], optional) – Specifies the action to take if the index already exists. Default is ‘replace’.
- Raises:
TypeError – If
indexis not an integer orvaluesis not a list or tuple.ValueError – If
on_conflictis not'replace'or'ignore'.DimensionError – If the dimensions of the provided
valuesare incompatible with the current model dimensions.SQLProgrammingError – If there is an issue with the SQL execution during the insertion.
- Returns:
None
Example:
import sqldatamodel as sdm # Sample data data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')] # Create the model df = sdm.SQLDataModel(data, headers=['Name','Age','Sex']) # Insert a new row at index 3 df.insert_row(3, ['David', 35, 'M']) # Insert or replace row at index 1 df.insert_row(1, ['Beth', 27, 'F'], on_conflict='replace') # View result print(df)
This will output the modified model:
┌───┬───────┬─────┬─────┐ │ │ Name │ Age │ Sex │ ├───┼───────┼─────┼─────┤ │ 0 │ Alice │ 20 │ F │ │ 1 │ Beth │ 27 │ F │ │ 2 │ Chris │ 30 │ M │ │ 3 │ David │ 35 │ M │ └───┴───────┴─────┴─────┘ [4 rows x 3 columns]
Note
Use
on_conflict = 'ignore'to take no action if row already exists, andon_conflict = 'replace'to replace it.See
SQLDataModel.append_row()for appending rows at the next available index instead of insertion at index.
- Changelog:
- Version 0.6.0 (2024-05-14):
Added
indexandon_conflictparameters for greater specificity and to align with broader conventions surrounding insert methods.
- Version 0.1.9 (2024-03-19):
New method.
- isna() set[int][source]
Return the row indicies containing null values from the current model.
- Returns:
Set of row indicies containing null values.
- Return type:
set[int]
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Gender', 'City'] data = [ ('Sarah', 35, 'Female', 'Houston'), ('Alice', None, 'Female', 'Milwaukee'), ('Mike', None, 'Male', 'Atlanta'), ('John', 25, 'Male', 'Boston'), ('Bob', None, 'Male', 'Chicago'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter for rows where 'Age' is null df = df[df['Age'].isna()] # View result print(df)
This will output the result containing the rows where ‘Age’ was null:
┌───────┬─────┬────────┬───────────┐ │ Name │ Age │ Gender │ City │ ├───────┼─────┼────────┼───────────┤ │ Alice │ │ Female │ Milwaukee │ │ Mike │ │ Male │ Atlanta │ │ Bob │ │ Male │ Chicago │ └───────┴─────┴────────┴───────────┘ [3 rows x 4 columns]
This can be used in combination with the setitem syntax to selectively update values as well:
# Filter and set the null values df[df['Age'].isna(), 'Age'] = 'Missing'
Note
Null or na like is determined by satisfying the SQL NULL value or the Python equivalent
Nonefor all values in the row.See related
SQLDataModel.notna()to filter for rows containing values that are not null.See
SQLDataModel.fillna()to fill all missing or null values in the model.
- Changelog:
- Version 0.7.2 (2024-06-11):
New method.
- iter_rows(min_row: int = None, max_row: int = None, index: bool = True, include_headers: bool = False) Iterator[tuple][source]
Returns an iterator over the specified rows in the current
SQLDataModel.- Parameters:
min_row (int, optional) – The minimum row index to start iterating from (inclusive). Defaults to None.
max_row (int, optional) – The maximum row index to iterate up to (exclusive). Defaults to None.
index (bool, optional) – Whether to include the row index in the output. Defaults to True.
include_headers (bool, optional) – Whether to include headers as the first row. Defaults to False.
- Yields:
Iterator[tuple]– An iterator containing the rows from the specified range with headers as the first row if specified.
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['First', 'Last', 'Salary']) # Iterate over the rows for row in df.iter_rows(min_row=2, max_row=4): pass # Do stuff
Note
Rows are referenced by their index and not their value. E.g.,
min_row = 0andmax_row = -1will reference the first and last rows, respectively.See
SQLDataModel.iter_tuples()for iterating over rows as named tuples.
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.9 (2024-03-19):
New method.
- iter_tuples(index: bool = False) Iterator[NamedTuple][source]
Returns an iterator of rows from the current
SQLDataModelas namedtuples using headers as field names.- Parameters:
index (bool, optional) – Whether to include the index column in the namedtuples. Default is False.
- Raises:
ValueError – Raised if headers are not valid Python identifiers. Use
SQLDataModel.normalize_headers()method to fix.- Yields:
Iterator[NamedTuple]– An iterator of namedtuples for each row using current headers for field names.
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['First', 'Last', 'Salary']) # Iterate over the namedtuples for row_tuple in df.iter_tuples(index=True): pass # Do stuff with namedtuples
Note
See
SQLDataModel.iter_rows()for iterating over rows with custom start and stop indicies.
- Changelog:
- Version 0.10.0 (2024-06-29):
Renamed
include_idx_colparameter toindexfor package consistency.Modified to use
SQLDataModel._generate_sql_stmt_fetchall()to leverage deterministic behavior of method.
- Version 0.1.9 (2024-03-19):
New method.
- max() SQLDataModel[source]
Returns a new
SQLDataModelcontaining the maximum value of all non-null values for each column in a row-wise orientation.- Returns:
A new SQLDataModel containing the maximum non-null value for each column.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data with missing values headers = ['Name', 'Age', 'Gender', 'Tenure'] data = [ ('Alice', 25, 'Female', 1.0), ('Bob', None, 'Male', 2.7), ('Charlie', 30, 'Male', None), ('David', None, 'Male', 3.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # Get maximum values min_values = df.min() # View result print(min_values)
This will output the maximum value of all non-null values for each column:
┌───────┬─────┬────────┬────────┐ │ Name │ Age │ Gender │ Tenure │ ├───────┼─────┼────────┼────────┤ │ David │ 30 │ Male │ 3.80 │ └───────┴─────┴────────┴────────┘ [1 rows x 4 columns]
Note
See
SQLDataModel.count_unique()for column-wise count of unique, null and total values for each column.See
SQLDataModel.min()for returning the minimum values in each column.
- Changelog:
- Version 0.3.1 (2024-04-01):
New method.
- max_column_width[source]
The maximum column width in characters to use for string representations of the data. Default is 38.
- Type:
int
- mean() SQLDataModel[source]
Returns a new
SQLDataModelcontaining the mean value of all viable columns in the current model. Calculated bysum(x_i, ..., x_n) * (1 / N)- Returns:
A new SQLDataModel containing the mean values of each column.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Birthday', 'Height', 'Date of Hire'] data = [ ('John', 30, '1994-06-15', 175.3, '2018-03-03 11:20:19'), ('Alice', 28, '1996-11-20', 162.0, '2023-04-24 08:45:30'), ('Travis', 37, '1987-01-07', 185.8, '2012-10-06 15:30:40') ] # Create the model and infer correct types df = sdm.SQLDataModel(data, headers, infer_dtypes=True) # View full model print(df)
This will output the sample model we’ll be using to calculate mean values for:
┌────────┬─────┬────────────┬─────────┬─────────────────────┐ │ Name │ Age │ Birthday │ Height │ Date of Hire │ ├────────┼─────┼────────────┼─────────┼─────────────────────┤ │ John │ 30 │ 1994-06-15 │ 175.30 │ 2018-03-03 11:20:19 │ │ Alice │ 28 │ 1996-11-20 │ 162.00 │ 2023-04-24 08:45:30 │ │ Travis │ 37 │ 1987-01-07 │ 185.80 │ 2012-10-06 15:30:40 │ └────────┴─────┴────────────┴─────────┴─────────────────────┘ [3 rows x 5 columns]
Now let’s find the mean values:
# Calculate the mean values df_mean = df.mean() # View result print(df_mean)
This will output the mean values for the “Age”, “Birthday”, “Height” and “Date of Hire” columns:
┌──────┬────────┬────────────┬─────────┬─────────────────────┐ │ Name │ Age │ Birthday │ Height │ Date of Hire │ ├──────┼────────┼────────────┼─────────┼─────────────────────┤ │ NaN │ 31.67 │ 1992-10-14 │ 174.37 │ 2018-01-30 11:52:09 │ └──────┴────────┴────────────┴─────────┴─────────────────────┘ [1 rows x 5 columns]
Note
Only non-null values are included in the calculation of the sum and the total number of values in the column, use
SQLDataModel.fillna()to fill null values.For
dateanddatetimecolumns values are converted to julian days prior to calculation and recast into original data type, some imprecision may occur as a result.See
SQLDataModel.min()for returning the minimum value,SQLDataModel.max()for maximum value, andSQLDataModel.describe()for descriptive statical values.
- Changelog:
- Version 0.3.7 (2024-04-10):
New method.
- merge(merge_with: SQLDataModel, how: Literal['left', 'right', 'inner', 'full outer', 'cross'] = 'left', left_on: str = None, right_on: str = None, include_join_column: bool = False) SQLDataModel[source]
Merges two
SQLDataModelinstances based on specified columns and merge type,how, returning the result as a new instance. If the join column shares the same name in both models,left_onandright_oncolumn arguments are not required and will be inferred. Otherwise, explicit arguments for both are required.- Parameters:
merge_with (SQLDataModel) – The SQLDataModel to merge with the current model.
how (Literal["left", "right", "inner", "full outer", "cross"]) – The type of merge to perform.
left_on (str) – The column name from the current model to use as the left join key.
right_on (str) – The column name from the
merge_withmodel to use as the right join key.include_join_column (bool) – If the shared column being used as the join key should be included from both tables. Default is False.
- Raises:
TypeError – If
merge_withis not of typeSQLDataModel.SQLProgrammingError – If sqlite3 version < 3.39.0 and join type is one of ‘right’ or ‘full outer’ which were unsupported.
DimensionError – If no shared column exists, and explicit
left_onandright_onarguments are not provided.ValueError – If the specified
left_onorright_oncolumn is not found in the respective models.
- Returns:
A new SQLDataModel containing the product of the merged result.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Left table data with ID column left_headers = ["Name", "Age", "ID"] left_data = [ ["Bob", 35, 1], ["Alice", 30, 5], ["David", 40, None], ["Charlie", 25, 2] ] # Right table data with shared ID column right_headers = ["ID", "Country"] right_data = [ [1, "USA"], [2, "Germany"], [3, "France"], [4, "Latvia"] ] # Create the left and right tables df_left = sdm.SQLDataModel(left_data, left_headers) df_right = sdm.SQLDataModel(right_data, right_headers)
Here are the left and right tables we will be joining:
Left Table: Right Table: ┌─────────┬──────┬──────┐ ┌──────┬─────────┐ │ Name │ Age │ ID │ │ ID │ Country │ ├─────────┼──────┼──────┤ ├──────┼─────────┤ │ Bob │ 35 │ 1 │ │ 1 │ USA │ │ Alice │ 30 │ 5 │ │ 2 │ Germany │ │ David │ 40 │ │ │ 3 │ France │ │ Charlie │ 25 │ 2 │ │ 4 │ Latvia │ └─────────┴──────┴──────┘ └──────┴─────────┘ [4 rows x 3 columns] [4 rows x 2 columns]
Left Join
# Create a model by performing a left join with the tables df_joined = df_left.merge(df_right, how="left") # View result print(df_joined)
This will output:
Left Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Alice │ 30 │ 5 │ │ │ David │ 40 │ │ │ │ Charlie │ 25 │ 2 │ Germany │ └─────────┴──────┴──────┴─────────┘ [4 rows x 4 columns]
Right Join
# Create a model by performing a right join with the tables df_joined = df_left.merge(df_right, how="right") # View result print(df_joined)
This will output:
Right Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Charlie │ 25 │ 2 │ Germany │ │ │ │ │ France │ │ │ │ │ Latvia │ └─────────┴──────┴──────┴─────────┘ [4 rows x 4 columns]
Inner Join
# Create a model by performing an inner join with the tables df_joined = df_left.merge(df_right, how="inner") # View result print(df_joined)
This will output:
Inner Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Charlie │ 25 │ 2 │ Germany │ └─────────┴──────┴──────┴─────────┘ [2 rows x 4 columns]
Full Outer Join
# Create a model by performing a full outer join with the tables df_joined = df_left.merge(df_right, how="full outer") # View result print(df_joined)
This will output:
Full Outer Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Alice │ 30 │ 5 │ │ │ David │ 40 │ │ │ │ Charlie │ 25 │ 2 │ Germany │ │ │ │ │ France │ │ │ │ │ Latvia │ └─────────┴──────┴──────┴─────────┘ [6 rows x 4 columns]
Cross Join
# Create a model by performing a cross join with the tables df_joined = df_left.merge(df_right, how="cross") # View result print(df_joined)
This will output:
Cross Join: ┌─────────┬──────┬──────┬─────────┐ │ Name │ Age │ ID │ Country │ ├─────────┼──────┼──────┼─────────┤ │ Bob │ 35 │ 1 │ USA │ │ Bob │ 35 │ 1 │ Germany │ │ Bob │ 35 │ 1 │ France │ │ Bob │ 35 │ 1 │ Latvia │ │ Alice │ 30 │ 5 │ USA │ │ Alice │ 30 │ 5 │ Germany │ │ Alice │ 30 │ 5 │ France │ │ Alice │ 30 │ 5 │ Latvia │ │ David │ 40 │ │ USA │ │ David │ 40 │ │ Germany │ │ David │ 40 │ │ France │ │ David │ 40 │ │ Latvia │ │ Charlie │ 25 │ 2 │ USA │ │ Charlie │ 25 │ 2 │ Germany │ │ Charlie │ 25 │ 2 │ France │ │ Charlie │ 25 │ 2 │ Latvia │ └─────────┴──────┴──────┴─────────┘ [16 rows x 4 columns]
Note
If
include_join_column=Falsethen only theleft_onjoin column is included in the result, with theright_oncolumn removed to avoid redundant shared key values.If
include_join_column=Truethen all the columns from both models are included in the result, with aliasing to avoid naming conflicts, seeutils.alias_duplicates()for details.The resulting
SQLDataModelis created based on thesqlite3join definition and specified columns and merge type, for details seesqlite3documentation.See
SQLDataModel.hstack()for horizontally stacking SQLDataModel using shared row dimensions.See
SQLDataModel.vstack()for vertically stacking SQLDataModel using shared column dimensions.
- Changelog:
- Version 0.10.2 (2024-06-30):
Changed
merge_withfrom keyword argument to positional argument to reflect argument is required and not optional.
- Version 0.10.1 (2024-06-29):
Modified to raise
SQLProgrammingErrorif available sqlite3 version < 3.39.0 and join type is one of ‘right’ or ‘full outer’, which was not supported by older versions.
- Version 0.1.9 (2024-03-19):
New method.
- min() SQLDataModel[source]
Returns a new
SQLDataModelcontaining the minimum value of all non-null values for each column in a row-wise orientation.- Returns:
A new SQLDataModel containing the minimum non-null value for each column.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data with missing values headers = ['Name', 'Age', 'Gender', 'Tenure'] data = [ ('Alice', 25, 'Female', 1.0), ('Bob', None, 'Male', 2.7), ('Charlie', 30, 'Male', None), ('David', None, 'Male', 3.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # Get minimum values min_values = df.min() # View result print(min_values)
This will output the minimum value of all non-null values for each column:
┌───────┬─────┬────────┬────────┐ │ Name │ Age │ Gender │ Tenure │ ├───────┼─────┼────────┼────────┤ │ Alice │ 25 │ Female │ 1.00 │ └───────┴─────┴────────┴────────┘ [1 rows x 4 columns]
Note
See
SQLDataModel.count_unique()for column-wise count of unique, null and total values for each column.See
SQLDataModel.max()for returning the maximum values in each column.
- Changelog:
- Version 0.3.1 (2024-04-01):
New method.
- min_column_width[source]
The minimum column width in characters to use for string representations of the data. Default is 3.
- Type:
int
- normalize_headers(apply_function: Callable = None) None[source]
Reformats the current
SQLDataModelheaders into an uncased normalized form using alphanumeric characters only. WrapsSQLDataModel.set_headers().- Parameters:
apply_function (Callable, optional) – Specify an alternative normalization pattern. When
None, the pattern'[^0-9a-z _]+'will be used on uncased values.- Returns:
None
Example:
import sqldatamodel as sdm # Create model df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Use default normalization scheme, uncased and strips invalid SQL identifiers df.normalize_headers() # Get renamed headers after default normalization df.get_headers() # now outputs ['first_name', 'last_name', 'salary'] # Or use custom renaming scheme df.normalize_headers(lambda x: x.upper()) # Get renamed headers again df.get_headers() # now outputs ['FIRST_NAME', 'LAST_NAME', 'SALARY']
- Changelog:
- Version 1.2.0 (2025-01-28):
Added duplicate aliasing to prevent post-normalization name collisions using
utils.alias_duplicates()Modified default normalization function to better handle occurrences of multiple invalid characters.
- Version 0.1.5 (2023-11-24):
New method.
- notna() set[int][source]
Return the row indicies that do not contain null values from the current model.
- Returns:
Set of row indicies containing values that are not null.
- Return type:
set[int]
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Gender', 'City'] data = [ ('Sarah', 35, 'Female', 'Houston'), ('Alice', None, 'Female', 'Milwaukee'), ('Mike', None, 'Male', 'Atlanta'), ('John', 25, 'Male', 'Boston'), ('Bob', None, 'Male', 'Chicago'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter for rows where 'Age' is not null df = df[df['Age'].notna()] # View result print(df)
This will output the result containing the rows where ‘Age’ was not null:
┌───────┬─────┬────────┬─────────┐ │ Name │ Age │ Gender │ City │ ├───────┼─────┼────────┼─────────┤ │ Sarah │ 35 │ Female │ Houston │ │ John │ 25 │ Male │ Boston │ └───────┴─────┴────────┴─────────┘ [2 rows x 4 columns]
This can be used in combination with the setitem syntax to selectively update values as well:
# Create a 'Notes' column with a default value df['Notes'] = 'Missing' # Filter and set the values that are not null df[df['Age'].notna(), 'Notes'] = 'Valid'
Note
Null or na like is determined by satisfying the SQL NOT NULL value or the Python equivalent
Nonefor any values in the row.See related
SQLDataModel.isna()to filter for rows containing values that are null.See
SQLDataModel.fillna()to fill all missing or null values in the model.
- Changelog:
- Version 0.7.2 (2024-06-11):
New method.
- pivot(pivot_column: str, category_column: str, amount_column: str | list[str] = None, fill_value: Any = None, agg_func: Literal['sum', 'avg', 'min', 'max'] = 'sum') SQLDataModel[source]
Create a pivot table using the columns specified and return the result as a new SQLDataModel.
The pivot method transforms the data in the SQLDataModel into a pivot table format summarizing the values of one column
amount_columnbased on unique values from two other columns, thepivot_columnand thecategory_column- Parameters:
pivot_column (str) – Column to pivot on. The unique values in this column will form the rows of the pivot table.
category_column (str) – Column to categorize the data. The unique values in this column will form the columns of the pivot table.
amount_column (str, list, optional) – Column(s) to aggregate. Accepts either a single string or a list of strings. Defaults to all numeric columns if not provided.
fill_value (Any, optional) – Value to fill when there is no data for a particular category. Defaults to None.
agg_func (Literal['sum', 'avg', 'min', 'max'], optional) – Aggregate function to use. Defaults to ‘sum’.
- Raises:
TypeError – If arguments for columns are of type ‘str’, ‘int’ or list, representing column name or integer index.
ValueError – If there are insufficient numeric columns for aggregation, invalid aggregate function, or insufficient distinct values in the category column.
- Returns:
A new SQLDataModel instance containing the result of the pivot operation.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['Product', 'Units', 'Qtr','Sales'] data = [ ('Chainsaw', 75, 'Q1', 1500), ('Chainsaw', 80, 'Q2', 1600), ('Chainsaw', 78, 'Q3', 1550), ('Chainsaw', 79, 'Q4', 1580), ('Hammer', 40, 'Q1', 800), ('Hammer', 42, 'Q2', 850), ('Hammer', 41, 'Q3', 820), ('Hammer', 42, 'Q4', 830), ('Drill', 50, 'Q1', 1000), ('Drill', 55, 'Q2', 1100), ('Drill', 52, 'Q3', 1050), ('Drill', 54, 'Q4', 1080) ] # Create the model df = sdm.SQLDataModel(data, headers) # Pivot 'Product' by quarterly sales quarterly_sales = df.pivot('Product', 'Qtr', 'Sales') # View result print(quarterly_sales)
This will output a wide pivot table summing up ‘Sales’ by ‘Qtr’ for each ‘Product’:
┌───┬──────────┬──────┬──────┬──────┬──────┐ │ │ Product │ Q1 │ Q2 │ Q3 │ Q4 │ ├───┼──────────┼──────┼──────┼──────┼──────┤ │ 0 │ Chainsaw │ 4810 │ 7800 │ 1550 │ 1580 │ │ 1 │ Drill │ 1000 │ 1100 │ 1050 │ 1080 │ │ 2 │ Hammer │ 800 │ 850 │ 820 │ 830 │ └───┴──────────┴──────┴──────┴──────┴──────┘ [3 rows x 5 columns]
Multiple columns can be aggregated to produce a wider pivot
# This time pivot by 'Sales' and 'Units' quarterly_metrics = df.pivot('Product', 'Qtr', ['Units','Sales']) # View new pivot print(quarterly_metrics)
When multiple aggregates are used, columns are labeled ‘<category_column> <amount_column>’:
┌───┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐ │ │ Product │ Q1 Units │ Q1 Sales │ Q2 Units │ Q2 Sales │ Q3 Units │ Q3 Sales │ Q4 Units │ Q4 Sales │ ├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤ │ 0 │ Chainsaw │ 225 │ 4810 │ 210 │ 7800 │ 78 │ 1550 │ 79 │ 1580 │ │ 1 │ Drill │ 50 │ 1000 │ 55 │ 1100 │ 52 │ 1050 │ 54 │ 1080 │ │ 2 │ Hammer │ 40 │ 800 │ 42 │ 850 │ 41 │ 820 │ 42 │ 830 │ └───┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘ [3 rows x 9 columns]
Note
When no
amount_columnis provided, all numeric column types will be used.All column arguments can also be specified by their integer index as well as their value.
Use
fill_valueto provide a default value when there is no data for a particular category.The aggregate function specified in
agg_funcmust be one of ‘sum’, ‘avg’, ‘min’, or ‘max’, with sum used by default.See
SQLDataModel.transpose()for transposing rows and columns directly.See
SQLDataModel.group_by()for regular aggregation.
- Changelog:
- Version 0.10.3 (2024-07-01):
New method.
- rename_column(column: int | str, new_column_name: str) None[source]
Renames a column in the
SQLDataModelat the specified index or using the old column name into the new value provided innew_column_name.- Parameters:
column (int|str) – The index or current str value of the column to be renamed.
new_column_name (str) – The new name as a str value for the specified column.
- Raises:
TypeError – If the
columnornew_column_nameparameters are invalid types.IndexError – If the provided column index is outside the current column range.
SQLProgrammingError – If there is an issue with the SQL execution during the column renaming.
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the model with sample data df = sdm.SQLDataModel(data, headers) # Example: Rename the column at index 1 to 'first_name' df.rename_column(1, 'first_name') # Get current values new_headers = df.get_headers() # Outputs ['first_name', 'last', 'age'] print(new_headers)
Note
The method allows renaming a column identified by its index in the SQLDataModel.
Handles negative indices by adjusting them relative to the end of the column range.
If an error occurs during SQL execution, it rolls back the changes and raises a SQLProgrammingError with an informative message.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- rename_header(header: int | str, new_header_name: str) None[source]
Renames a header in the
SQLDataModelat the specified index or using the old header name into the new value provided innew_header_name.- Parameters:
header (int|str) – The index or current str value of the header to be renamed.
new_header_name (str) – The new name as a str value for the specified header.
- Raises:
TypeError – If the
headerornew_header_nameparameters are invalid types.IndexError – If the provided header index is outside the current column range.
SQLProgrammingError – If there is an issue with the SQL execution during the header renaming.
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the model with sample data df = sdm.SQLDataModel(data, headers) # Example: Rename the column at index 1 to 'first_name' df.rename_column(1, 'first_name') # Get current values new_headers = df.get_headers() # Outputs ['first_name', 'last', 'age'] print(new_headers)
Note
The method allows renaming a column identified by its index in the SQLDataModel.
Handles negative indices by adjusting them relative to the end of the column range.
If an error occurs during SQL execution, it rolls back the changes and raises a SQLProgrammingError with an informative message.
- Changelog:
- Version 1.2.0 (2025-01-28):
New convenience method wrapping
SQLDataModel.rename_column()to reflect additional naming convention.
- rename_headers(new_headers: list[str] | Callable[[list[str]], list[str]]) None[source]
Renames the current
SQLDataModelheaders to values provided innew_headers. Headers must match the existing column count.- Parameters:
new_headers (list[str] or Callable[[list[str]], list[str]]) – A sequence (e.g., list, tuple) of new header names or a callable that takes the existing headers and returns a new list of header names.
- Raises:
TypeError – If the
new_headerstype is not a valid type (list or tuple) or contains instances that are not of type ‘str’.DimensionError – If the length of
new_headersdoes not match the column count.TypeError – If the type of the first element in
new_headersis not a valid type (str, int, or float).
- Returns:
None
Example:
import sqldatamodel as sdm # Create model df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # Set new headers df.rename_headers(['First_Name', 'Last_Name', 'Payment']) # Alternatively, provide a callable argument to transform headers using existing names df.rename_headers(lambda headers: [header.replace(' ', '_') for header in headers])
- Changelog:
- Version 1.2.0 (2025-01-28):
New convenience method wrapping
SQLDataModel.set_headers()to reflect additional naming convention.
- replace(pattern: str, replacement: str, inplace: bool = False, **kwargs) SQLDataModel[source]
Replaces matching occurrences of a specified pattern with a replacement value in the
SQLDataModelinstance. If inplace is True, the method updates the existing SQLDataModel; otherwise, it returns a newSQLDataModelwith the replacements applied.- Parameters:
pattern (str) – The substring or regular expression pattern to search for in each column.
replacement (str) – The string to replace the matched pattern with.
inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.
**kwargs – Additional keyword arguments to be passed to the
execute_fetchmethod when not in-place.
- Raises:
TypeError – If the
patternorreplacementparameters are invalid types.- Returns:
If
inplace=True, modifies the current instance in-place and returnsNone. Otherwise, returns a new SQLDataModel with the specified replacements applied.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm headers = ['first', 'last', 'age', 'service'] data = [ ('John', 'Smith', 27, 1.22), ('Sarah', 'West', 39, 0.7), ('Mike', 'Harlin', 36, 3), ('Pat', 'Douglas', 42, 11.5) ] # Create the model df = sdm.SQLDataModel(data, headers,display_float_precision=2, display_index=False) # Replace 'John' in the 'first' column df['first'] = df['first'].replace("John","Jane") # View model print(df)
This will output:
┌───────┬─────────┬──────┬─────────┐ │ first │ last │ age │ service │ ├───────┼─────────┼──────┼─────────┤ │ Jane │ Smith │ 27 │ 1.22 │ │ Sarah │ West │ 39 │ 0.70 │ │ Mike │ Harlin │ 36 │ 3.00 │ │ Pat │ Douglas │ 42 │ 11.50 │ └───────┴─────────┴──────┴─────────┘ [4 rows x 4 columns]
Note
See
SQLDataModel.contains()for identifying substring presence.See
SQLDataModel.endswith()for checking end of value for substring.See
SQLDataModel.startswith()for checking beggining of value for substring.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- reset_index(start_index: int = 0) None[source]
Resets the index of the
SQLDataModelinstance inplace to zero-based sequential autoincrement, or to specifiedstart_indexbase with sequential incrementation.- Parameters:
start_index (int, optional) – The starting index for the reset operation. Defaults to 0.
- Raises:
TypeError – If provided
start_indexargument is not of typeintValueError – If the specified
start_indexis greater than the minimum index in the current model.SQLProgrammingError – If reset index execution results in constraint violation or programming error.
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age', 'service'] data = [ (0, 'john', 'smith', 27, 1.22), (1, 'sarah', 'west', 39, 0.7), (2, 'mike', 'harlin', 36, 3), (3, 'pat', 'douglas', 42, 11.5) ] # Create the model df = sdm.SQLDataModel(data, headers) # View current state print(df)
This will output:
┌─────┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├─────┼────────┼─────────┼────────┼─────────┤ │ 994 │ john │ smith │ 27 │ 1.22 │ │ 995 │ sarah │ west │ 39 │ 0.70 │ │ 996 │ mike │ harlin │ 36 │ 3.00 │ │ 997 │ pat │ douglas │ 42 │ 11.50 │ └─────┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
Now reset the index column:
# Reset the index with default start value df.reset_index() # View updated model print(df)
This will output:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 27 │ 1.22 │ │ 1 │ sarah │ west │ 39 │ 0.70 │ │ 2 │ mike │ harlin │ 36 │ 3.00 │ │ 3 │ pat │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
Reset the index to a custom value:
# Reset the index with a different value df.reset_index(start_index = -3) # View updated model print(df)
This will output:
┌────┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├────┼────────┼─────────┼────────┼─────────┤ │ -3 │ john │ smith │ 27 │ 1.22 │ │ -2 │ sarah │ west │ 39 │ 0.70 │ │ -1 │ mike │ harlin │ 36 │ 3.00 │ │ 0 │ pat │ douglas │ 42 │ 11.50 │ └────┴────────┴─────────┴────────┴─────────┘ [4 rows x 4 columns]
Note
The current index should be viewed more as a soft row number, to assign hard indicies use
SQLDataModel.freeze_index()method.Setting
start_indextoo a very large negative or positive integer made lead to unpredictable behavior.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- sample(n_samples: float | int = 0.05, **kwargs) SQLDataModel[source]
Return a random sample of size
n_samplesas a newSQLDataModel.- Parameters:
n_samples (float | int) – Number of rows or proportion of rows to sample. Default set to
0.05, proportional to 5% of the currentSQLDataModel.row_count. Ifn_samplesis an integer, it represents the exact number of rows to sample where0 < n_samples <= row_count. Ifn_samplesis a float, it represents the proportion of rows to sample where0.0 < n_samples <= 1.0.- Returns:
A new SQLDataModel instance containing the sampled rows.
- Return type:
SQLDataModel- Raises:
TypeError – If the
n_samplesparameter is not of type ‘int’ or ‘float’.ValueError – If the
n_samplesvalue is invalid or out of range.
This method generates a random sample of rows from the current SQLDataModel. The number of rows to sample can be specified either as an integer representing the exact number of rows or as a float representing the proportion of rows to sample. The sampled rows are returned as a new SQLDataModel instance.
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Amount']) # Example 1: Sample 10 random rows sample_result = df.sample(n_samples=10) # Create the model df2 = sdm.from_csv('another_example.csv', headers=['Code', 'Description', 'Price']) # Example 2: Sample 20% of rows sample_result2 = df2.sample(n_samples=0.2)
Note
If the current model’s
SQLDataModel.row_countvalue is less than the sample size, the current row count will be used instead.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- set_column_alignment(alignment: Literal['dynamic', 'left', 'center', 'right'] = 'dynamic') None[source]
Sets the default alignment behavior for
SQLDataModelwhenreprorprintis called, modifiescolumn_alignmentattribute. Default behavior set to'dynamic', which right-aligns numeric data types, left-aligns all other types, with headers matching value alignment.- Parameters:
alignment (str) – The column alignment setting to use.
'dynamic': Default behavior, dynamically aligns columns based on column data types.'left': Left-align all column values.'center': Center-align all column values.'right': Right-align all column values.- Raises:
TypeError – If the argument for alignment is not of type ‘str’.
ValueError – If the provided alignment is not one of ‘dynamic’, ‘left’, ‘center’, ‘right’.
- Returns:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Set to right-align columns df.set_column_alignment('right') # Output print(df)
This will output the model with values right-aligned:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 27 │ 1.22 │ │ 1 │ sarah │ west │ 39 │ 0.70 │ │ 2 │ mike │ harlin │ 36 │ 3.00 │ │ 3 │ pat │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘
Setting columns to be left-aligned:
# Set to left-align sdm.set_column_alignment('left') # Output print(df)
This will output the model with left-aligned values instead:
┌───┬────────┬─────────┬────────┬─────────┐ │ │ first │ last │ age │ service │ ├───┼────────┼─────────┼────────┼─────────┤ │ 0 │ john │ smith │ 27 │ 1.22 │ │ 1 │ sarah │ west │ 39 │ 0.70 │ │ 2 │ mike │ harlin │ 36 │ 3.00 │ │ 3 │ pat │ douglas │ 42 │ 11.50 │ └───┴────────┴─────────┴────────┴─────────┘
Note
Use
SQLDataModel.get_column_alignment()to return the current column alignment setting.When using ‘center’, if the column contents cannot be perfectly centralized, the left side will be favored.
Use ‘dynamic’ to return to default column alignment, which is right-aligned for numeric types and left-aligned for others.
See
SQLDataModel.set_table_style()for modifying table format and available styles.
- Changelog:
- Version 0.1.80 (2024-02-24):
Changed expected values for
alignmentparameter from f-string modifiers to more descriptive values ‘dynamic’, ‘left’, ‘center’ or ‘right’.
- set_column_dtypes(column: str | int | dict, dtype: Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str'] = None) None[source]
Casts the specified
columninto the provided pythondtypeusing the equivalent SQL data type.- Parameters:
column (str or int or dict) – The name or index of the column to be cast, or a dictionary mapping column names to dtypes. If a dictionary, keys are column names or indices and values are the dtypes.
dtype (Literal['bool', 'bytes', 'date', 'datetime', 'float', 'int', 'None', 'str']) – The target Python data type for the specified column. Ignored if
columnis a dictionary.
- Raises:
TypeError – If
columnis not of type ‘str’, ‘int’, or ‘dict’, or if any dtype is invalid.IndexError – If
columnis an integer and the index is outside of the current model range.ValueError – If
columnis a string and the column is not found in the current model.
- Returns:
The model’s data types are successfully casted to the new type and nothing is returned.
- Return type:
None
Example:
import sqldatamodel as sdm # Sample data headers = ['idx', 'First', 'Last', 'Age'] data = [ (0, 'John', 'Smith', 27) (1, 'Sarah', 'West', 29), (2, 'Mike', 'Harlin', 36), (3, 'Pat', 'Douglas', 42), ] # Create the model df = sdm.SQLDataModel(data, headers) # Original dtype for comparison old_dtype = df.get_column_dtypes('Age') # Set the data type of the 'Age' column to 'float' df.set_column_dtypes('Age', 'float') # Confirm column dtype new_dtype = df.get_column_dtypes('Age') # View result print(f"Age dtype: {old_dtype} -> {new_dtype}")
This will output:
Age dtype: int -> floatWarning
Type casting will coerce any nonconforming values to the
dtypebeing set, this means data will be lost if casted incorrectly.
Note
Column data types are mapped to SQL types and not Python class types, see
sqlite3docs for additional information.See
SQLDataModel.infer_dtypes()to automatically infer the correct column data types using random sampling.
- Changelog:
- Version 0.7.9 (2024-06-20):
Modified to allow
columnargument to be provided as a dictionary mapping column names to dtypes to reflect current structure atSQLDataModel.dtypes.
- Version 0.1.9 (2024-03-19):
New method.
- set_display_color(color: str | tuple = None, rand_color: bool = False) None[source]
Sets the table string representation color when
SQLDataModelis displayed in the terminal, selecting a random color ifrand_color = True.- Parameters:
color (str or tuple) – Color to set. Accepts hex value (e.g.,
'#A6D7E8') or tuple of RGB values e.g.,(166, 215, 232). When not provided orcolor = None, the terminal default color will be used.rand_color (bool, optional) – Set a random color from a preselected pool of options. When True
colorwill be ignored and a random color selected instead.
- Returns:
The color value is set at
SQLDataModel.display_colorand nothing is returned.- Return type:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['Name', 'Age', 'Salary']) # Set color using hex value df.set_display_color('#A6D7E8') # Set color using rgb value df.set_display_color((166, 215, 232))
If you’re unsure of which color to use, have one selected for you:
# Surprise me! Use a random color df.set_display_color(rand_color=True) # View the value set print(df.display_color)
In this case we got a nice ‘plum’ color:
ANSIColor('#F6A8CC')To reset to the default terminal color:
# Set color to None df.set_display_color(color=None) # View the value set print(df.display_color) # None
This will return None, signifying the default terminal color will be used.
Note
By default, no color styling is applied and the native terminal color is used.
To use rgb values, ensure a single tuple is provided as an argument.
When
rand_color = Truea random color is selected from a preexisting pool, seeSQLDataModel.ANSIColor.ANSIColor.rand_color()for more details.
- Changelog:
- Version 0.12.0 (2024-07-06):
Added
rand_colorargument to require explicit selection for random color and returncolor = Noneto instead reset color to terminal default.
- Version 0.10.2 (2024-06-30):
Modified to randomly select a color from preselected pool when
color = Nonefor demonstration purposes, seeSQLDataModel.ANSIColorfor more details.
- Version 0.7.0 (2024-06-08):
Removed warning message and modified to raise exception on failure to create display color pen.
- Version 0.1.5 (2023-11-24):
New method.
- set_display_float_precision(float_precision: int) None[source]
Sets the current float display precision to the specified value for use in the
reprmethod of theSQLDataModelwhen representing float data types. Note that this precision limit is overridden by themax_column_widthvalue if the precision limit exceeds the specified maximum width.- Parameters:
float_precision (int) – The desired float display precision to be used for real number values.
- Raises:
TypeError – If the
float_precisionargument is not of type ‘int’.ValueError – If the
float_precisionargument is a negative value, as it must be a valid f-string precision identifier.
- Returns:
None
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age', 'service_time'] data = [ (0, 'john', 'smith', 27, 1.22) ,(1, 'sarah', 'west', 0.7) ,(2, 'mike', 'harlin', 3) ,(3, 'pat', 'douglas', 11.5) ] # Create the model with sample data df = sdm.SQLDataModel(data,headers) # Example: Set the float display precision to 2 df.set_display_float_precision(2) # View model print(df)
This will output:
┌───┬────────┬─────────┬────────┬──────────────┐ │ │ first │ last │ age │ service_time │ ├───┼────────┼─────────┼────────┼──────────────┤ │ 0 │ john │ smith │ 27 │ 2.10 │ │ 1 │ sarah │ west │ 29 │ 0.80 │ │ 2 │ mike │ harlin │ 36 │ 1.30 │ │ 3 │ pat │ douglas │ 42 │ 7.02 │ └───┴────────┴─────────┴────────┴──────────────┘
Use
SQLDataModel.get_display_float_precision()to get the current value set:# Get the updated float display precision updated_precision = sdm.get_display_float_precision() # Outputs 2 print(updated_precision)
Note
The
display_float_precisionattribute only affects the precision for displaying real or floating point values.The actual precision of the stored value in the model is unaffected by the value set.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- set_display_index(display_index: bool) None[source]
Sets the value for
SQLDataModel.display_indexto enable or disable the inclusion of theSQLDataModelindex value in print or repr calls.- Parameters:
display_index (bool) – Whether or not to include the index in
SQLDataModelrepresentations.- Raises:
TypeError – If the provided argument is not a boolean value.
- Returns:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Disable displaying index df.set_display_index(False)
Note
Use
SQLDataModel.set_table_style()to more broadly modify the appearance and formatting style ofSQLDataModelstring representations.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- set_display_max_rows(rows: int | None) None[source]
Sets value at
SQLDataModel.display_max_rowsto limit maximum rows displayed whenreprorprintis called. Userows = Noneto derive max number to display from the current terminal height.- Parameters:
rows (int) – The maximum number of rows to display.
- Raises:
TypeError – If the provided argument is not
Noneor is not an integer.IndexError – If the provided value is an integer less than or equal to 0.
- Returns:
None
Example:
import sqldatamodel as sdm # Create model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Any call to `print` or `repr` will be restricted to 500 max rows df.set_display_max_rows(500) # Alternatively, auto-detect dimensions by setting to `None` df.set_display_max_rows(None)
Note
Modifying
SQLDataModel.display_max_rowsdoes not affect the actual number of rows in the model, only the maximum rows displayed.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- set_headers(new_headers: list[str] | dict[str | int, str] | Callable[[list[str]], list[str]]) None[source]
Renames the current
SQLDataModelheaders to values innew_headers, which can be provided as:A sequence (list/tuple) of strings matching the full column count.
A dictionary mapping existing indices (int) or names (str) to new names (str).
A callable that transforms the existing header list.
- Parameters:
new_headers (list[str] | dict[str|int, str] | Callable) – The new header configuration. - If list/tuple: Must be same length as column count. - If dict: Keys can be current column names or indices (supporting negative indexing). Values are new names. Unspecified columns remain unchanged. - If callable: Receives current headers list, returns new headers list.
- Raises:
TypeError – If
new_headerscontains invalid types.DimensionError – If the length of the resulting header list does not match the column count.
IndexError – If a dictionary key is an integer index out of bounds.
ValueError – If a dictionary key is a string name not found in current headers.
SQLProgrammingError – If an error is encountered while attempting to rename the SQL columns.
- Returns:
None
Example:
import sqldatamodel as sdm # Create model df = sdm.from_csv('example.csv', headers=['First Name', 'Last Name', 'Salary']) # 1. Rename all using list df.set_headers(['First_Name', 'Last_Name', 'Payment']) # 2. Rename specific columns using dict (mixed keys allowed) # Rename 'Payment' (index -1) to 'Annual_Salary' and 'First_Name' to 'FName' df.set_headers({-1: 'Annual_Salary', 'First_Name': 'FName'}) # 3. Transform using callable df.set_headers(lambda headers: [h.upper() for h in headers])
- Changelog:
- Version 2.2.0 (2025-11-28):
Added support for
dictinput to allow partial renaming and mixed index/name keys.
- Version 2.1.1 (2025-11-25):
Modified to retain original column ordering when existing model headers match a subset of
new_headers.
- Version 1.2.0 (2025-01-28):
Added ability to provide a callable for
new_headers.
- Version 0.1.5 (2023-11-24):
New method.
- set_max_column_width(width: int) None[source]
Set
max_column_widthas the maximum number of characters per column whenreprorprintis called.- Parameters:
width (int) – The maximum width for each column.
- Returns:
Sets the
max_column_widthproperty.- Return type:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Change the max column width for the table representation df.set_max_column_width(20)
Note
If
max_column_widthis set to a value below the currentmin_column_widthproperty, the maximum width will override the minimum width.The minimum required width is
2, whenmax_column_width < 2,2will be used regardless of thewidthprovided.See
SQLDataModel.set_min_column_width()to set minimum column width for table representations.
- set_min_column_width(width: int) None[source]
Set
min_column_widthas the minimum number of characters per column whenreprorprintis called.- Parameters:
width (int) – The minimum width for each column.
- Returns:
Sets the
min_column_widthproperty.- Return type:
None
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['ID', 'Name', 'Value']) # Set a new minimum column width value df.set_min_column_width(8) # Check updated value print(df.min_column_width) # 8
Note
If
min_column_widthis set to a value below the currentmax_column_widthproperty, the maximum width will override the minimum width.The minimum required width is
2, whenmin_column_width < 2,2will be used regardless of thewidthprovided.See
SQLDataModel.set_max_column_width()to set maximum column width for table representations.
- set_model_name(new_name: str) None[source]
Sets the new
SQLDataModeltable name that will be used as an alias for any SQL queries executed by the user or internally.- Parameters:
new_name (str) – The new table name for the
SQLDataModel.- Raises:
SQLProgrammingError – If unable to rename the model table due to SQL execution failure.
Example:
import sqldatamodel as sdm # Create the model df = sdm.from_csv('example.csv', headers=['Column1', 'Column2']) # Rename the model df.set_model_name('custom_table')
Note
The provided value must be a valid SQL table name.
This alias will be reset to the default value for any new
SQLDataModelinstances:'sdm'.
- Changelog:
- Version 0.1.5 (2023-11-24):
New method.
- set_table_style(style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = 'default') None[source]
Sets the table style used for string representations of
SQLDataModel.- Parameters:
style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple']) – The table styling to set. Setting to
'default'style will return the style representation to the original format.- Raises:
ValueError – If
styleprovided is not one of the currently supported options ‘ascii’, ‘bare’, ‘dash’, ‘default’, ‘double’, ‘latex’, ‘list’, ‘markdown’, ‘outline’, ‘pandas’, ‘polars’, ‘postgresql’ or ‘round’.- Returns:
None
Examples:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height', 'Birthday'] data = [ ('Alice', 28, 162.08, '1996-11-20'), ('Bobby', 30, 175.36, '1994-06-15'), ('Craig', 37, 185.82, '1987-01-07'), ('David', 32, 179.75, '1992-12-28') ] # Create the model df = sdm.SQLDataModel(data, headers) # Lets try the round style df.set_table_style('round') # View it print(df)
This outputs the
'round'table style:╭───────┬─────┬─────────┬────────────╮ │ Name │ Age │ Height │ Birthday │ ├───────┼─────┼─────────┼────────────┤ │ Alice │ 28 │ 162.08 │ 1996-11-20 │ │ Bobby │ 30 │ 175.36 │ 1994-06-15 │ │ Craig │ 37 │ 185.82 │ 1987-01-07 │ │ David │ 32 │ 179.75 │ 1992-12-28 │ ╰───────┴─────┴─────────┴────────────╯
Alternatively, set
style = 'ascii'to formatSQLDataModelin the ASCII style, the OG of terminal tables:+-------+-----+---------+------------+ | Name | Age | Height | Birthday | +-------+-----+---------+------------+ | Alice | 28 | 162.08 | 1996-11-20 | | Bobby | 30 | 175.36 | 1994-06-15 | | Craig | 37 | 185.82 | 1987-01-07 | | David | 32 | 179.75 | 1992-12-28 | +-------+-----+---------+------------+
Set
style = 'bare'to formatSQLDataModelin the following style:Name Age Height Birthday ------------------------------- Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28
Set
style = 'dash'to formatSQLDataModelwith dashes for internal borders:┌───────┬─────┬─────────┬────────────┐ │ Name ╎ Age ╎ Height ╎ Birthday │ ├╴╴╴╴╴╴╴┼╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴┼╴╴╴╴╴╴╴╴╴╴╴╴┤ │ Alice ╎ 28 ╎ 162.08 ╎ 1996-11-20 │ │ Bobby ╎ 30 ╎ 175.36 ╎ 1994-06-15 │ │ Craig ╎ 37 ╎ 185.82 ╎ 1987-01-07 │ │ David ╎ 32 ╎ 179.75 ╎ 1992-12-28 │ └───────┴─────┴─────────┴────────────┘
Set
style = 'default'to formatSQLDataModelin the following style, which also happens to be the default styling applied:┌───────┬─────┬─────────┬────────────┐ │ Name │ Age │ Height │ Birthday │ ├───────┼─────┼─────────┼────────────┤ │ Alice │ 28 │ 162.08 │ 1996-11-20 │ │ Bobby │ 30 │ 175.36 │ 1994-06-15 │ │ Craig │ 37 │ 185.82 │ 1987-01-07 │ │ David │ 32 │ 179.75 │ 1992-12-28 │ └───────┴─────┴─────────┴────────────┘ [4 rows x 4 columns]
Set
style = 'list'to formatSQLDataModelas a list of values, similar to the SQLite CLI representation:Name Age Height Birthday ----- --- ------- ---------- Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28
Set
style = 'double'to formatSQLDataModelusing double line borders:╔═══════╦═════╦═════════╦════════════╗ ║ Name ║ Age ║ Height ║ Birthday ║ ╠═══════╬═════╬═════════╬════════════╣ ║ Alice ║ 28 ║ 162.08 ║ 1996-11-20 ║ ║ Bobby ║ 30 ║ 175.36 ║ 1994-06-15 ║ ║ Craig ║ 37 ║ 185.82 ║ 1987-01-07 ║ ║ David ║ 32 ║ 179.75 ║ 1992-12-28 ║ ╚═══════╩═════╩═════════╩════════════╝
Set
style = 'markdown'to formatSQLDataModelin the Markdown style:| Name | Age | Height | Birthday | |-------|-----|---------|------------| | Alice | 28 | 162.08 | 1996-11-20 | | Bobby | 30 | 175.36 | 1994-06-15 | | Craig | 37 | 185.82 | 1987-01-07 | | David | 32 | 179.75 | 1992-12-28 |
Set
style = 'outline'to formatSQLDataModelin the following style:┌─────────────────────────────────┐ │ Name Age Height Birthday │ ├─────────────────────────────────┤ │ Alice 28 162.08 1996-11-20 │ │ Bobby 30 175.36 1994-06-15 │ │ Craig 37 185.82 1987-01-07 │ │ David 32 179.75 1992-12-28 │ └─────────────────────────────────┘
Set
style = 'pandas'to formatSQLDataModelin the style used by Pandas DataFrames:Name Age Height Birthday Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28
Set
style = 'polars'to formatSQLDataModelin the style used by Polars DataFrames:┌───────┬─────┬─────────┬────────────┐ │ Name ┆ Age ┆ Height ┆ Birthday │ ╞═══════╪═════╪═════════╪════════════╡ │ Alice ┆ 28 ┆ 162.08 ┆ 1996-11-20 │ │ Bobby ┆ 30 ┆ 175.36 ┆ 1994-06-15 │ │ Craig ┆ 37 ┆ 185.82 ┆ 1987-01-07 │ │ David ┆ 32 ┆ 179.75 ┆ 1992-12-28 │ └───────┴─────┴─────────┴────────────┘
Set
style = 'postgresql'to formatSQLDataModelin the style used by PostgreSQL:Name | Age | Height | Birthday ------+-----+---------+----------- Alice | 28 | 162.08 | 1996-11-20 Bobby | 30 | 175.36 | 1994-06-15 Craig | 37 | 185.82 | 1987-01-07 David | 32 | 179.75 | 1992-12-28
Set
style = 'rst-grid'to formatSQLDataModelin the style required for Sphinx and reStructured text grid tables:+-------+-----+---------+------------+ | Name | Age | Height | Birthday | +=======+=====+=========+============+ | Alice | 28 | 162.08 | 1996-11-20 | | Bobby | 30 | 175.36 | 1994-06-15 | | Craig | 37 | 185.82 | 1987-01-07 | | David | 32 | 179.75 | 1992-12-28 | +-------+-----+---------+------------+
Set
style = 'rst-simple'to formatSQLDataModelin the style required for Sphinx and reStructured simple tables:===== === ======= ========== Name Age Height Birthday ===== === ======= ========== Alice 28 162.08 1996-11-20 Bobby 30 175.36 1994-06-15 Craig 37 185.82 1987-01-07 David 32 179.75 1992-12-28 ===== === ======= ==========
Set
style = 'latex'to formatSQLDataModelin the style of a LaTeX table:\hline Name & Age & Height & Birthday \\ \hline Alice & 28 & 162.08 & 1996-11-20 \\ Bobby & 30 & 175.36 & 1994-06-15 \\ Craig & 37 & 185.82 & 1987-01-07 \\ David & 32 & 179.75 & 1992-12-28 \\ \hlineHowever,
SQLDataModel.to_latex()should be used to format complete table elements for LaTeX files.Note
The labels given to certain styles are entirely subjective and do not in any way express original design or ownership of the styling used.
Legacy character sets on older terminals may not support all the character encodings required for some styles.
See
SQLDataModel._generate_table_style()for implementation details related to each format.
- Changelog:
- Version 0.11.0 (2024-07-05):
Added style
'latex'to generate LaTeX style tables.
- Version 0.9.3 (2024-06-28):
Added styles
'rst-grid'and'rst-simple'to allowSQLDataModelto generate table formats used by Sphinx and reStructured Text.
- Version 0.3.11 (2024-04-18):
Removed
'thick'style and added'list'style for greater variety of available formats.
- Version 0.3.8 (2024-04-12):
New method.
- shape[source]
The current dimensions of the model as a tuple of
(rows, columns).- Type:
tuple[int, int]
- sort(by: str | int | Iterable[str | int] = None, asc: bool = True) SQLDataModel[source]
Sort columns in the dataset by the specified ordering. If no value is specified, the current
SQLDataModel.sql_idxcolumn is used with the default orderingasc = True.- Parameters:
by (str | int | Iterable[str | int], optional) – The column or list of columns by which to sort the dataset. Defaults to sorting by the dataset’s index.
asc (bool, optional) – If True, sort in ascending order; if False, sort in descending order. Defaults to ascending order.
- Raises:
TypeError – If value for
byargument is not one of type ‘str’, ‘int’ or ‘list’.ValueError – If a specified column in
byis not found in the current dataset or is an invalid column.IndexError – If columns are indexed by integer but are outside of the current model range.
- Returns:
A new instance of SQLDataModel with columns sorted according to the specified ordering.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm headers = ['first', 'last', 'age', 'service', 'hire_date'] data = [ ('John', 'Smith', 27, 1.22, '2023-02-01'), ('Sarah', 'West', 39, 0.7, '2023-10-01'), ('Mike', 'Harlin', 36, 3.9, '2020-08-27'), ('Pat', 'Douglas', 42, 11.5, '2015-11-06'), ('Kelly', 'Lee', 32, 8.0, '2016-09-18') ] # Create the model df = sdm.SQLDataModel(data, headers) # Sort by last name column sorted_df = df.sort('last') # View sorted model print(sorted_df)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┐ │ │ first │ last │ age │ service │ hire_date │ ├───┼───────┼─────────┼──────┼─────────┼────────────┤ │ 0 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ │ 1 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ │ 2 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ │ 3 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ │ 4 │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ └───┴───────┴─────────┴──────┴─────────┴────────────┘ [5 rows x 5 columns]
Sort by multiple columns:
# Sort by multiple columns in descending order sorted_df = df.sort(['age','hire_date'], asc=False) # View sorted print(sorted_df)
This will output:
┌───┬───────┬─────────┬──────┬─────────┬────────────┐ │ │ first │ last │ age │ service │ hire_date │ ├───┼───────┼─────────┼──────┼─────────┼────────────┤ │ 0 │ Pat │ Douglas │ 42 │ 11.50 │ 2015-11-06 │ │ 1 │ Sarah │ West │ 39 │ 0.70 │ 2023-10-01 │ │ 2 │ Mike │ Harlin │ 36 │ 3.90 │ 2020-08-27 │ │ 3 │ Kelly │ Lee │ 32 │ 8.00 │ 2016-09-18 │ │ 4 │ John │ Smith │ 27 │ 1.22 │ 2023-02-01 │ └───┴───────┴─────────┴──────┴─────────┴────────────┘ [5 rows x 5 columns]
Note
Standard sorting process for
sqlite3is used, whereby the ordering prefers the first column mentioned to the last.Ascending and descending ordering follows this order of operations for multiple columns as well.
- Changelog:
- Version 0.8.0 (2024-06-21):
Modified to allow mixed integer and value indexing for columns sort order in
byargument to reflect similar flexibility for column input across package.
- Version 0.5.1 (2024-05-10):
Modified to allow integer indexing for column sort order in
byargument.
- Version 0.1.9 (2024-03-19):
New method.
- sql_db_conn[source]
The in-memory sqlite3 connection object in use by the model.
- Type:
sqlite3.Connection
- sql_idx[source]
The index column name applied to the sqlite3 in-memory representation of the model. Default is
'idx'- Type:
str
- sql_model[source]
The table name applied to the sqlite3 in-memory representation of the model. Default is
'sdm'- Type:
str
- startswith(pat: str | Iterable[str], case: bool = True) set[int][source]
Return the row indices that start with the specified pattern(s) in any column from the model, converting to
str(value)for comparison.- Parameters:
pat (str | Iterable[str]) – The pattern or iterable of patterns to search for within the data.
case (bool, optional) – If True (default), the search is case-sensitive. If False, the search is case-insensitive.
- Raises:
TypeError – If argument for
patis not of type ‘str’ or an iterable of type ‘str’ representing the substring pattern(s).- Returns:
Set of row indices containing values that match the pattern(s).
- Return type:
set[int]
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Sex', 'City'] data = [ ('Mike', 31, 'M', 'Chicago'), ('John', 25, 'M', 'Dayton'), ('Alice', 27, 'F', 'Boston'), ('Sarah', 35, 'F', 'Houston'), ('Bobby', 42, 'M', 'Chicago'), ('Steve', 28, 'F', 'Austin'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter for rows where any column starts with the string 'Chi' matching_indices = df['City'].startswith('Chi') # Apply filter to model df_city = df[matching_indices] # View result print(df_city)
This will output the result of applying the filter to the model:
┌───┬───────┬─────┬─────┬─────────┐ │ │ Name │ Age │ Sex │ City │ ├───┼───────┼─────┼─────┼─────────┤ │ 0 │ Mike │ 31 │ M │ Chicago │ │ 4 │ Bobby │ 42 │ M │ Chicago │ └───┴───────┴─────┴─────┴─────────┘ [2 rows x 4 columns]
Instead of searching a single column, the entire model can be searched:
# Method can also search all columns, and be applied directly df_prefix = df[df.startswith('A', case=False)] # View result print(df_prefix)
This will output the result of a case-insensitive search:
┌───┬───────┬─────┬─────┬─────────┐ │ │ Name │ Age │ Sex │ City │ ├───┼───────┼─────┼─────┼─────────┤ │ 2 │ Alice │ 27 │ F │ Boston │ │ 5 │ Steve │ 28 │ F │ Austin │ └───┴───────┴─────┴─────┴─────────┘ [2 rows x 4 columns]
This can be used in combination with the setitem syntax to selectively update values as well:
# Create a 'State' column with a default value df['State'] = None # Filter and set the values that start with the pattern df[df.startswith('Chi'), 'State'] = 'Illinois' # Multiple conditions can be used tx_1 = df.startswith('Hou') tx_2 = df.startswith('Aus') # Then chained together using set notation df[(tx_1 | tx_2), 'State'] = 'Texas' # Alternatively, an iterable of patterns can be provided df[df.startswith(['Hou','Aus']), 'State'] = 'Texas'
Note
Any non-string values are converted using
str(value)for comparisons only.See
SQLDataModel.__eq__()for strict equality comparison operations.See
SQLDataModel.__and__()for more details on bitwise and set operations.See
SQLDataModel.__setitem__()for more details on syntaxdf[row, column] = valueand correct usage.See
SQLDataModel.contains()andSQLDataModel.endswith()for additional string methods.
- Changelog:
- Version 0.7.8 (2024-06-18):
New method.
- static_py_to_sql_map_dict[source]
The data type mapping to use when converting python types to SQL column types.
- Type:
dict
- static_sql_to_py_map_dict[source]
The data type mapping to use when converting SQL column types to python types.
- Type:
dict
- strip(characters: str = None, str_dtype_only: bool = True, inplace: bool = False) SQLDataModel | None[source]
Removes the specified characters from the beginning and end of each value in the current
SQLDataModelremoving leading and trailing whitespace characters by default.- Parameters:
characters (str, optional) – The characters to remove from both ends of the value. Default is None, removing whitespace (
' ','\t','\n','\r').str_dtype_only (bool, optional) – If True, only columns with dtype = ‘str’ are stripped, otherwise all columns are stripped. Default is True.
inplace (bool, optional) – If True, modifies the current SQLDataModel instance in-place. Default is False.
- Raises:
TypeError – If
charactersargument is provided and is not of type'str'representing unordered characters to remove.- Returns:
If
inplace=False, returns a new SQLDataModel with the stripped values. Otherwise modifies the current instance in-place returning None.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create a single item model df = sdm.SQLDataModel([[' Hello, World! ']]) # Strip whitespace and print print(df.strip())
This will output the model after stripping the leading and trailing whitespace characters:
┌───┬───────────────┐ │ │ 0 │ ├───┼───────────────┤ │ 0 │ Hello, World! │ └───┴───────────────┘ [1 rows x 1 columns]
Non-whitespace characters can also be stripped:
import sqldatamodel as sdm headers = ['Col A', 'Col B', 'Col C'] data = [ ['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3'] ] # Create the sample model df = sdm.SQLDataModel(data, headers) # Strip leading and trailing 'A' character df_stripped = df.strip('A') # View result print(df_stripped)
This will output a new model where any leading and trailing ‘A’ characters have been removed:
┌───────┬───────┬───────┐ │ Col A │ Col B │ Col C │ ├───────┼───────┼───────┤ │ 1 │ B1 │ C1 │ │ 2 │ B2 │ C2 │ │ 3 │ B3 │ C3 │ └───────┴───────┴───────┘ [3 rows x 3 columns]
Multiple characters can be stripped, and the model modified inplace:
# Strip multiple characters and this time modify model inplace df.strip('123', inplace=True) # View result print(df)
This will output the modified model after stripping leading and trailing ‘123’ characters:
┌───────┬───────┬───────┐ │ Col A │ Col B │ Col C │ ├───────┼───────┼───────┤ │ A │ B │ C │ │ A │ B │ C │ │ A │ B │ C │ └───────┴───────┴───────┘ [3 rows x 3 columns]
Note
For string replacement instead of string removal, see
SQLDataModel.replace().When using
str_dtype_only = False, numeric values may be modified due to SQLite’s type affinity rules.This method is equivalent to the SQLite
trim(string, character)function, wrapping and passing the equivalent arguments.
- Changelog:
- Version 0.4.3 (2024-05-07):
New method.
- table_style[source]
The table style used for string representations of the model. Available styles are
'ascii','bare','dash','default','double','list','markdown','outline','pandas','polars','postgresql'or'round'. Defaults to'default'table style.- Type:
str
- tail(n_rows: int = 5) SQLDataModel[source]
Returns the last
n_rowsof the currentSQLDataModel.- Parameters:
n_rows (int, optional) – Number of rows to return. Defaults to 5.
- Raises:
TypeError – If
n_rowsargument is not of type ‘int’ representing the number of rows to return from the tail of the model.- Returns:
A new
SQLDataModelinstance containing the specified number of rows.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Countries data available for sample dataset url = 'https://developers.google.com/public-data/docs/canonical/countries_csv' # Create the model df = sdm.from_html(url) # Get tail of model df_tail = df.tail() # View it print(df_tail)
This will grab the bottom 5 rows by default:
┌─────┬─────────┬──────────┬───────────┬───────────────────┐ │ │ country │ latitude │ longitude │ name │ ├─────┼─────────┼──────────┼───────────┼───────────────────┤ │ 240 │ WF │ -13.7688 │ -177.1561 │ Wallis and Futuna │ │ 241 │ EH │ 24.2155 │ -12.8858 │ Western Sahara │ │ 242 │ YE │ 15.5527 │ 48.5164 │ Yemen │ │ 243 │ ZM │ -13.1339 │ 27.8493 │ Zambia │ │ 244 │ ZW │ -19.0154 │ 29.1549 │ Zimbabwe │ └─────┴─────────┴──────────┴───────────┴───────────────────┘ [5 rows x 4 columns]
Note
See related
SQLDataModel.head()for the opposite, grabbing the topn_rowsfrom the current model.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- to_csv(filename: str = None, delimiter: str = ',', quotechar: str = '"', lineterminator: str = '\r\n', na_rep: str = 'None', encoding: str = 'utf-8', index: bool = False, **kwargs) str | None[source]
Writes
SQLDataModelto the specified file iffilenameargument if provided, otherwise returns the model directly as a CSV formatted string literal.- Parameters:
filename (str) – The name of the CSV file to which the data will be written. Default is None, returning as raw literal.
delimiter (str, optional) – The delimiter to use for separating values. Default is ‘,’.
quotechar (str, optional) – The character used to quote fields. Default is ‘”’.
lineterminator (str, optional) – The character used to terminate the row and move to a new line. Default is ‘rn’.
na_rep (str, optional) – String representation to use for null or missing values. Default is ‘None’.
encoding (str, optional) – The encoding to use when writing the model to a CSV file. Default is ‘utf-8’.
index (bool, optional) – If True, includes the index in the CSV file; if False, excludes the index. Default is False.
**kwargs – Additional arguments to be passed to the
csv.writerconstructor.
- Returns:
If
filenameis None, returns the model as a delimited string literal,Noneiffilenameis provided, writing the model to the specified file as a CSV file.- Return type:
str|None
Example:
Returning CSV Literal
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # Generate the literal using tab delimiter csv_literal = df.to_csv(delimiter='\t') # View output print(csv_literal)
This will output:
Name Age Height John 30 175.3 Alice 28 162.0 Travis 35 185.8
Write to File
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # CSV filename csv_file = 'persons.csv' # Write to the file, keeping the index df.to_csv(filename=csv_file, index=True)
Contents of
persons.csv:idx,Name,Age,Height 0,John,30,175.3 1,Alice,28,162.0 2,Travis,35,185.8
Note
When
index=True, thesdm_indexproperty determines the column name of the index in the result.Modifying
delimiteraffects how the data is delimited when writing tofilenameand when returning as raw literal, any valid delimiter can be used.Quoting behavior can be modified by providing an additional keywork arg such as
quoting=1to wrap all values in quotes, orquoting=2to quote only non-numeric values, seecsv.QUOTE_Xenums for all options.Use
SQLDataModel.to_text()to pretty print table in specified style for visualizing output if strict delimiting is unnecessary.See
SQLDataModel.from_csv()for creating a newSQLDataModelfrom existing CSV data
- Changelog:
- Version 0.6.4 (2024-05-17):
Added
encodingparameter to pass to file handler when writing contents as CSV file and set default toutf-8to align with expected SQLite codec.
- Version 0.4.0 (2024-04-23):
Modified quoting behavior to avoid redundant quoting and to closely mimic csv module from standard library.
Added
na_repto fill null or missing values when generating output, useful for space delimited data and minimal quoting.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- to_dict(orient: Literal['rows', 'columns', 'list'] = 'rows', index: bool = None) dict | list[dict][source]
Converts the
SQLDataModelinstance to a dictionary or a list of dictionaries based on the specified orientation.- Parameters:
orient (Literal["rows", "columns", "list"]) – The orientation of the output, see examples for more detail.
"rows": Returns a dictionary with index values as keys and row values as values."columns": Returns a dictionary with column names as keys and column values as tuples."list": Returns a list of dictionaries, where each dictionary represents a row.index (bool) – Whether to include the index column in the output. Defaults to the display_index property.
- Raises:
ValueError – if value for
orientis not one of “rows”, “columns” or “list”.- Returns:
The converted data structure based on the specified orientation.
- Return type:
dict|list[dict]
Examples:
Orient by Rows
import sqldatamodel as sdm # Sample data headers = ['Col A','Col B', 'Col C'] data = [ ['A,0', 'A,1', 'A,2'], ['B,0', 'B,1', 'B,2'], ['C,0', 'C,1', 'C,2'] ] # Create the model df = sdm.SQLDataModel(data, headers) # Convert to dictionary with rows as keys and values rows_dict = df.to_dict(orient="rows") # View output for k, v in rows_dict.items(): print(f"{k}: {v}")
This will output:
0: ('A,0', 'B,0', 'C,0') 1: ('A,1', 'B,1', 'C,1') 2: ('A,2', 'B,2', 'C,2')
Orient by Columns
# Convert to dictionary with columns as keys and rows as values columns_dict = df.to_dict(orient="columns") # View output for k, v in columns_dict.items(): print(f"{k}: {v}")
This will output:
Col A: ('A,0', 'A,1', 'A,2') Col B: ('B,0', 'B,1', 'B,2') Col C: ('C,0', 'C,1', 'C,2')
Orient by List
# Convert to list of dictionaries with each dictionary representing a row with columns as keys list_dict = df.to_dict(orient="list") # View output for row in list_dict: print(row)
This will output:
{'Col A': 'A,0', 'Col B': 'B,0', 'Col C': 'C,0'} {'Col A': 'A,1', 'Col B': 'B,1', 'Col C': 'C,1'} {'Col A': 'A,2', 'Col B': 'B,2', 'Col C': 'C,2'}
Note
Use
indexto return index data, otherwise current instancedisplay_indexvalue will be used.For
'list'orientation, data returned is JSON-like in structure, where each row has its own “column”: “value” data.
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.5 (2023-11-24):
New method.
- to_excel(filename: str, worksheet: int | str = 1, index: bool = False, if_exists: Literal['append', 'replace', 'fail'] = 'replace') None[source]
Writes the current
SQLDataModelto the specified Excelfilename.- Parameters:
filename (str) – The file path to save the Excel file, e.g.,
filename = 'output.xlsx'.worksheet (int | str, optional) – The index or name of the worksheet to write to. Defaults to 1, indicating the first worksheet.
index (bool, optional) – If
SQLDataModelindex should be included in the output. Default is False.if_exists (Literal['append','replace','fail']) – Action to take if file already exists. Default is ‘replace’, overwriting existing file.
- Raises:
ModuleNotFoundError – If the required package
openpyxlis not installed as determined byoptionals._has_xlflag.TypeError – If the
filenameargument is not of type ‘str’ representing a valid Excel file path to create or write to.ValueError – If
if_existsis not one of ‘append’, ‘replace’ or ‘fail’ representing action to take if file exists.IndexError – If
worksheetis provided as type ‘int’ but is out of range of the available worksheets.Exception – If any unexpected exception occurs during the Excel writing and saving process.
- Returns:
If successful, a new Excel file
filenameis created andNoneis returned.- Return type:
None
Example:
import openpyxl import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Rate', 'Gender'] data = [ ('Alice', 25, 26.50, 'Female'), ('Bob', 30, 21.25, 'Male'), ('Will', 35, 24.00, 'Male'), ('Mary', 32, 23.75, 'Female') ] # Create the model df = sdm.SQLDataModel(data, headers) # Export into a new Excel file df.to_excel('Team-Overview.xlsx') # Or append to existing Excel file as a new worksheet df.to_excel('Team.xlsx', worksheet='Demographics', if_exists='append')
This will create a new Excel file
Team-Overview.xlsx:┌───────┬──────┬────────┬────────┐ │ A │ B │ C │ D │ ┌───┼───────┼──────┼────────┼────────┤ │ 1 │ Name │ Age │ Gender │ Rate │ │ 2 │ Alice │ 25 │ Female │ 26.50 │ │ 3 │ Mary │ 32 │ Female │ 23.75 │ │ 4 │ Bobby │ 30 │ Male │ 21.25 │ │ 5 │ Will │ 35 │ Male │ 24.00 │ └───┴───────┴──────┴────────┴────────┘ [ Sheet1 ]
Note
Headers are dynamically inserted based on value for
if_exists, where using ‘replace’ will include headers and ‘append’ will ignore them unless worksheet creation occurred.When providing a string argument for
worksheet, if the sheet does not exist, it will be created. However if providing an integer index for an out of range sheet, anIndexErrorwill be raised.See related
SQLDataModel.from_excel()for creating aSQLDataModelfrom existing Excel content.
- Changelog:
- Version 0.8.1 (2024-06-23):
Added
if_existsparameter to provide the options to replace or append to existing file, as well as to fail if already exists.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.2.2 (2024-03-26):
New method.
- to_html(filename: str = None, index: bool = None, encoding: str = 'utf-8', style_params: dict = None) str[source]
Returns the current SQLDataModel as a lightly formatted HTML <table> element as a string if
filenameis None. Iffilenameis specified, writes the HTML to the specified file as .html and returns None.- Parameters:
filename (str) – The file path to save the HTML content. If None, returns the HTML as a string (default is None).
index (bool) – Whether to include the index column in the HTML table (default is current
display_index).encoding (str) – Character encoding to use when writing model to HTML file, default set to
'utf-8'.style_params (dict) – A dictionary representing CSS styles {property: value} to customize the appearance of the HTML table (default is None).
- Raises:
TypeError – If
filenameis not a valid string when specified or ifstyle_paramsis not a dictionary when specified.OSError – If encountered while trying to open and write the HTML to the file.
- Returns:
If
filenameis None, returns the HTML content as a string. Iffilenameis specified, writes to the file and returns None.- Return type:
str|None
Example:
import sqldatamodel as sdm # Create the model df = sdm.SQLDataModel(data=[(1, 'John'), (2, 'Doe')], headers=['ID', 'Name']) # Create and save as new html file df.to_html('output.html', style_params={'font-size': '12pt'}) # Get HTML as a string html_string = df.to_html() # View output print(html_string)
This will output:
<table> <tr> <th>ID</th> <th>Name</th> </tr> <tr> <td>1</td> <td>John</td> </tr> <tr> <td>2</td> <td>Doe</td> </tr> </table> <style> table {font:size: 12pt;} </style>
Note
Base styles are applied to reflect the styling of
SQLDataModelin the terminal, including anydisplay_colorwhich is applied to the table CSS.Table index is determined by the instance
display_indexattribute unless specified in the argument of the same name, overriding the instance attribute.The default background-color is #E5E5E5, and the default font color is #090909, with 1 px solid border to mimic the
reprfor the instance.
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.9 (2024-03-19):
New method.
- to_json(filename: str = None, index: bool = None, **kwargs) list | None[source]
Converts the
SQLDataModelinstance to JSON format. Iffilenameis specified, the JSON is written to the file; otherwise, a JSON-like object is returned.- Parameters:
filename (str) – The path to the file where JSON will be written. If None, no file is created and JSON-like object is returned.
index (bool) – Whether to include the index column in the JSON. Defaults to the
display_indexproperty.**kwargs – Additional keyword arguments to pass to the json.dump() method.
- Raises:
TypeError – If
filenameis not of type ‘str’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename.
- Returns:
If
filenameis None, a list containing a JSON-like object is returned. Otherwise JSON file created and returnsNone.- Return type:
list|None
Examples:
To JSON Literal
import sqldatamodel as sdm # Sample JSON to first create model json_source = [ {"id": 1, "color": "red", "value": "#f00", "notes": "primary"} ,{"id": 2, "color": "green", "value": "#0f0", "notes": None} ,{"id": 3, "color": "blue", "value": "#00f", "notes": "primary"} ] # Create the model df = sdm.from_json(json_source) # View current state print(df)
This will output:
┌─────┬───────┬───────┬─────────┐ │ id │ color │ value │ notes │ ├─────┼───────┼───────┼─────────┤ │ 1 │ red │ #f00 │ primary │ │ 2 │ green │ #0f0 │ │ │ 3 │ blue │ #00f │ primary │ └─────┴───────┴───────┴─────────┘ [3 rows x 4 columns]
Write JSON File
# Write model to JSON file df.to_json('output.json') # Or convert to JSON-like object json_data = df.to_json() # View JSON object print(json_data)
This will output:
[{ "id": 1, "color": "red", "value": "#f00", "notes": "primary" }, { "id": 2, "color": "green", "value": "#0f0", "notes": null }, { "id": 3, "color": "blue", "value": "#00f", "notes": "primary" }]
Note
When no filename is specified, JSON-like object will be returned as a rowwise array.
Any nested structure will be flattened by this method as well as the
SQLDataModel.from_json()method.
- Changelog:
- Version 0.3.2 (2024-04-02):
Changed return object to JSON string literal when
filename=Noneto convert to valid literal object.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.9 (2024-03-19):
New method.
- to_latex(filename: str = None, index: bool = False, bold_headers: bool = False, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, horizontal_ellipses: str = '⠤⠄', index_rep: str = None, format_output_as: Literal['table', 'document'] = 'table', column_alignment: Literal['left', 'center', 'right', 'dynamic'] = None) str | None[source]
Returns the current
SQLDataModelas a LaTeX table string iffilenameis None, otherwise writes the table to the provided file as a LaTeX document.- Parameters:
filename (str, optional) – The name of the file to write the LaTeX content. If not provided, the LaTeX content is returned as a string. Default is None.
index (bool, optional) – Whether to include the index column in the LaTeX output. Default is False.
bold_headers (bool, optional) – Whether the headers should be bolded in the LaTeX table. Default is False.
min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on attribute
SQLDataModel.min_column_width.max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on attribute
SQLDataModel.max_column_width.float_precision (int, optional) – The precision for floating-point values. Default is current value set on
SQLDataModel.display_float_precision.horizontal_ellipses (str, optional) – Characters to truncate column and cell values when horizontal truncation is required. Default is
'⠤⠄'.index_rep (str, optional) – String representation for the index. Default is None, using value set on
SQLDataModel.sql_idxto represent the index column. Only used when generating index column, otherwise ignored whenindex = False.format_output_as (Literal['table', 'document']), optional) – Whether the output should be formatted as a LaTeX table or as a standalone document. Default is ‘table’, formatting output as a modular table element.
column_alignment (Literal['left', 'center', 'right', 'dynamic'], optional) – The column alignment to use. Default is current value set on attribute
SQLDataModel.column_alignment.
- Returns:
If
filenameis None, returns the LaTeX formatted table as a string, iffilenameis provided, writes the LaTeX table to the specified file and returns None.- Return type:
strorNone- Raises:
TypeError – If the
filenameargument is not of type ‘str’,indexargument is not of type ‘bool’,min_column_widthormax_column_widthargument is not of type ‘int’.ValueError – If
format_output_asis not one of ‘table’, ‘document’, orcolumn_alignmentprovided and is not one of ‘left’, ‘center’, ‘right’, ‘dynamic’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename.
- LaTeX Formatting:
LaTeX output format that is generated can be set by
format_output_aswhich provides one of two formats:'table': Output formatted as insertable table, beginning and ending with LaTeX\begin{table}and\end{table}respectively.'document': Output formatted as standalone document, beginning and ending with LaTeX\begin{document}and\end{document}respectively.
LaTeX table alignment will follow the
SQLDataModelinstance alignment, set bySQLDataModel.set_column_alignment():'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.'left': Left-aligns all column content, equivalent to LaTeX column format:|l|.'center': Center-aligns all column content preferring left on uneven splits, equivalent to LaTeX column format:|c|.'right': Right-aligns all column content, equivalent to LaTeX column format:|r|.
The LaTeX rows generated will use
dynamicalignment regardless ofcolumn_alignmentprovided, this will not affect the rendered alignment but will maintain consistent format without affecting the actual alignment rendered by LaTeX.
Examples:
Returning LaTeX Literal
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data=data, headers=headers) # Generate LaTeX table literal latex_output = df.to_latex() # View LaTeX output print(latex_output)
This will output:
\begin{tabular}{|l|r|r|} \hline {Name} & {Age} & {Height} \ \hline John & 30 & 175.30 \ Alice & 28 & 162.00 \ Michael & 35 & 185.80 \ \hline \end{tabular}
Write to LaTeX File
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data=data, headers=headers) # Write the output to the file, formatting the output as a proper LaTeX document latex_table = df.to_latex(filename='Table.tex', format_output_as='document')
Contents of file
Table.tex:\documentclass{article} \begin{document} \begin{table}[h] \centering \begin{tabular}{|l|r|r|} \hline {Name} & {Age} & {Height} \ \hline John & 30 & 175.30 \ Alice & 28 & 162.00 \ Michael & 35 & 185.80 \ \hline \end{tabular} \end{table} \end{document}
Note
A
\centeringcommand is included in the LaTeX output by default regardless of alignments specified.LaTeX headers and rows are indented by four spaces to keep with conventional table syntax and to distinguish the table data from commands.
- Table commands and headers are checked for invalid LaTeX characters and escaped such as
'_'and'#', however the model data is not. Accordingly, ensure any model content is valid LaTeX when rendering to PDF, or simply format content as valid LaTeX before exporting.
- Table commands and headers are checked for invalid LaTeX characters and escaped such as
- Changelog:
- Version 0.11.0 (2024-07-05):
Added
float_precisionparameter to align with similar format specific methods and provide additional formatting options.Added
horizontal_ellipsesparameter to allow customizing truncation characters used when column or cell values exceed maximumn column widths.Added
index_repparameter to allow customizing index column name with prior behavior set as default representation. Ignored whenindex = False.Modified to use
SQLDataModel.to_string()instead of generating independently formatted repr for more consistency between tabular outputs.Modified to check and escape any invalid LaTeX characters or symbols when generating headers.
- Version 0.10.4 (2024-07-03):
Modified to escape newline characters through
utils.sqlite_printf_format()to avoid wrapping table rows.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.9 (2024-03-19):
New method.
- to_list(index: bool = False, include_headers: bool = False) list[source]
Returns the current
SQLDataModeldata as a 1-dimensional list of values if data dimensions are compatible with flattening, or as a list of lists if data is 2-dimensional. Data is returned without index or headers by default, useindex = Trueorinclude_headers = Trueto modify.- Parameters:
index (bool, optional) – If True, includes the index in the result, if False, excludes the index. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result, if False, excludes headers. Default is False.
- Returns:
The flattened list of values corresponding to the model data.
- Return type:
list
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('Beth', 27, 172.4), ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data, headers) # Get all model data as a list of lists model_data = df.to_list() # Iterate over each row for row in model_data: print(row)
This will output:
['Beth', 27, 172.4] ['John', 30, 175.3] ['Alice', 28, 162.0] ['Travis', 35, 185.8]
Data will be flattened into a single dimension if possible, such as when accessing individual columns:
# Get 'Name' column as a list col_data = df['Name'].to_list() # View output print(col_data)
This will output a list containing the values from each row for the column:
['Beth', 'John', 'Alice', 'Travis']
Data will also be flattened when accessing individual rows:
# Get first row as a list with index row_data = df[0].to_list(index=True) # View result print(row_data)
This will output the row’s values including the index:
[0, 'Beth', 27, 172.4]
Note
See
SQLDataModel.data()to return the equivalent ofcursor.fetchall()with data as a list of tuples.See
SQLDataModel.iter_rows()to generate an iterable over the model data, which is preferred wherever possible.
- Changelog:
- Version 0.5.0 (2024-05-09):
Modified behavior to output 1-dimensional list when possible and a list of lists when not possible.
Changed default to
index = Falseto increase surface for 1-dimensional flattening.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.5 (2023-11-24):
New method.
- to_local_db(filename: str) None[source]
Writes the
SQLDataModelin-memory database to disk as a SQLite database file using the specified filename.- Parameters:
filename (str) – The filename or filepath to use when writing the model to disk.
- Raises:
TypeError – If
filenameis provided and is not of type ‘str’ representing a valid sqlite database save path.sqlite3.Error – If there is an issue with the SQLite database operations during backup.
- Returns:
None
Example:
import sqlite3 import sqldatamodel as sdm # Sample data data = [('Alice', 20, 'F'), ('Billy', 25, 'M'), ('Chris', 30, 'M')] # Create the model df = sdm.SQLDataModel(data, headers=['Name','Age','Sex']) # Filename to use for database db_file = "model.db" # Write the in-memory database model to disk df.to_local_db(db_file) # Loading the model back from disk can now be done at anytime df = sdm.from_sql("sdm", sqlite3.connect(db_file)) # View restored model print(df)
This will output the model we originally created:
┌───┬───────┬─────┬─────┐ │ │ Name │ Age │ Sex │ ├───┼───────┼─────┼─────┤ │ 0 │ Alice │ 20 │ F │ │ 1 │ Billy │ 25 │ M │ │ 2 │ Chris │ 30 │ M │ └───┴───────┴─────┴─────┘ [3 rows x 3 columns]
Note
Use any compatible SQL API to load the resulting database file or use
SQLDataModel.from_sql()to reload it back into aSQLDataModel.Table name is determined by value at
SQLDataModel.sql_modelwhich is set to'sdm'by default, useSQLDataModel.set_model_name()to modify.
- Changelog:
- Version 0.5.2 (2024-05-13):
Renamed
dbparameter tofilenamefor package consistency and to avoid confusion between similarily named database objects.Changed
filenamefrom keyword to positional argument making it a required parameter to avoid accidental overwriting.
- Version 0.1.5 (2023-11-24):
New method.
- to_markdown(filename: str = None, index: bool = False, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, horizontal_ellipses: str = '⠤⠄', index_rep: str = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None) str | None[source]
Returns the current
SQLDataModelas a markdown table literal iffilenameis None, otherwise writes the table to the provided file as markdown.- Parameters:
filename (str, optional) – The name of the file to write the Markdown content. If not provided, the Markdown content is returned as a string. Default is None.
index (bool, optional) – Whether to include the index column in the Markdown output. Default is False.
min_column_width (int, optional) – The minimum column width for table cells. Default is current value set on
SQLDataModel.min_column_width.max_column_width (int, optional) – The maximum column width for table cells. Default is current value set on
SQLDataModel.max_column_width.float_precision (int, optional) – The precision for floating-point values. Default is current value set on
SQLDataModel.display_float_precision.horizontal_ellipses (str, optional) – Characters to truncate column and cell values when horizontal truncation is required. Default is
'⠤⠄'.index_rep (str, optional) – String representation for the index. Default is None, using value set on
SQLDataModel.sql_idxto represent the index column. Only used when generating index column, otherwise ignored whenindex = False.column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – The alignment for table columns. Default is current value set on
SQLDataModel.column_alignment.'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.'left': Left-aligns all column content.'center': Center-aligns all column content preferring left on uneven splits.'right': Right-aligns all column content.
- Raises:
TypeError – If the
filenameargument is not of type ‘str’, iffloat_precision,min_column_widthormax_column_widtharguments are not type ‘int’.ValueError – If the
column_alignmentargument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename.
- Returns:
If
filenameis None, returns the Markdown table as a string, iffilenameis provided, writes the Markdown table to the specified file and returns None.- Return type:
strorNone
- Column Alignment:
'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.'left': Left-aligns all column content.'center': Center-aligns all column content preferring left on uneven splits.'right': Right-aligns all column content.
Examples:
To Markdown Literal
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data=data, headers=headers) # Generate markdown table literal markdown_table = df.to_markdown() # View markdown output print(markdown_table)
This will output:
| Name | Age | Height | |:--------|-----:|--------:| | John | 30 | 175.30 | | Alice | 28 | 162.00 | | Michael | 35 | 185.80 |
Write to Markdown File
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data=data, headers=headers) # Write the output to the file, center-aligning all columns df.to_markdown(filename='Table.MD', column_alignment='center')
Contents of
Table.MD:| Name | Age | Height | |:--------|-----:|--------:| | John | 30 | 175.30 | | Alice | 28 | 162.00 | | Michael | 35 | 185.80 |
Note
All markdown output will contain the alignment characters
':'as determined by theSQLDataModel.column_alignmentattribute or parameter.Any exception encountered during file read or writing operations is caught and reraised, see related
SQLDataModel.from_markdown().Use
index_repto provide a different representation, column name, for the index column if included in output.Unlike other representations, no rowwise or vertical truncation is performed on output content.
- Changelog:
- Version 0.11.0 (2024-07-05):
Added
horizontal_ellipsesparameter to allow customizing truncation characters used when column or cell values exceed maximumn column widths.Added
index_repparameter to allow customizing index column name with prior behavior set as default representation. Ignored whenindex = False.Modified to use
SQLDataModel.to_string()instead of generating independently formatted repr for more consistency between tabular outputs.
- Version 0.10.4 (2024-07-03):
Modified to escape newline characters through
utils.sqlite_printf_format()to avoid wrapping table rows.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.9 (2024-03-19):
New method.
- to_numpy(index: bool = False, include_headers: bool = False) _np.ndarray[source]
Converts
SQLDataModelto a NumPyndarrayobject of shape(rows, columns). Note that thenumpypackage must be installed to use this method.- Parameters:
index (bool, optional) – If True, includes the model index in the result. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result. Default is False.
- Raises:
ModuleNotFoundError – If NumPy is not installed.
- Returns:
The model’s data converted into a NumPy array.
- Return type:
numpy.ndarray
Example:
import numpy import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the sample model df = sdm.SQLDataModel(data, headers) # Create the numpy array with default parameters, no indicies or headers result_array = df.to_numpy() # View array print(result_array)
This will output:
[['John' '30' '175.3'] ['Alice' '28' '162.0'] ['Travis' '35' '185.8']]
Model headers can also be retained:
# Create the numpy array with with indicies and headers result_array = df.to_numpy(index=True, include_headers=True) # View array print(result_array)
This will output:
[['idx' 'Name' 'Age' 'Height'] ['0' 'John' '30' '175.3'] ['1' 'Alice' '28' '162.0'] ['2' 'Travis' '35' '185.8']]
Note
Output will always be a 2-dimensional array of type
numpy.ndarray
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.3 (2023-10-15):
New method.
- to_pandas(index: bool = False, include_headers: bool = True) _pd.DataFrame[source]
Converts
SQLDataModelto a PandasDataFrameobject. Note that thepandaspackage must be installed to use this method.- Parameters:
index (bool, optional) – If True, includes the model index in the result. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result. Default is True.
- Raises:
ModuleNotFoundError – If Pandas is not installed.
- Returns:
The model’s data converted to a Pandas DataFrame.
- Return type:
pandas.DataFrame
Example:
import pandas import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df_sdm = sdm.SQLDataModel(data, headers) # Convert the model to a pandas df df_pd = df_sdm.to_pandas(include_headers=True, index=True) # View result print(df_pd)
This will output:
Name Age Height 0 John 30 175.3 1 Alice 28 162.0 2 Travis 35 185.8
Note
SQLDataModel uses different data types than those used in
pandas, seeSQLDataModel.set_column_dtypes()for more information about casting rules.
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.1.3 (2023-10-15):
New method.
- to_parquet(filename: str, index: bool = True, **kwargs) None[source]
Writes the current SQLDataModel to the specified parquet filename.
- Parameters:
filename (str) – The file path to save the parquet file, e.g.,
filename = 'user/data/output.parquet'.index (bool, optional) – Whether or not the SQLDataModel index should be included in the export. Default is True.
**kwargs – Additional keyword arguments to pass to the pyarrow
write_tablefunction.
- Raises:
ModuleNotFoundError – If the required package
pyarrowis not installed as determined byoptionals._has_paflag.TypeError – If the
filenameargument is not of type ‘str’ representing a valid parquet file path.Exception – If any unexpected exception occurs during the parquet writing process.
- Returns:
If successful, a new parquet file
filenameis created andNoneis returned.- Return type:
None
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Rate'] data = [('Alice', 25, 26.50), ('Bob', 30, 21.25), ('Will', 35, 24.00)] # Create the model df = sdm.SQLDataModel(data,headers, display_index=False) # Parquet file pq_file = "output.parquet" # Write the model as parquet file df.to_parquet(pq_file) # Confirm result by reading back file df_result = sdm.from_parquet(pq_file) # View model print(df_result)
This will output:
┌───────┬──────┬────────┐ │ Name │ Age │ Rate │ ├───────┼──────┼────────┤ │ Alice │ 25 │ 26.50 │ │ Bob │ 30 │ 21.25 │ │ Will │ 35 │ 24.00 │ └───────┴──────┴────────┘ [3 rows x 3 columns]
Note
The
pyarrowpackage is required to use this method as well as theSQLDataModel.from_parquet()method.The
SQLDataModel.to_dict()method is used prior to writing to parquet to convert theSQLDataModelinto a dictionary suitable for parquet Table format.Exceptions raised by the
pyarrowpackage and its methods are caught and reraised when encountered to keep with package error formatting.
- Changelog:
- Version 0.8.2 (2024-06-24):
Added
indexparameter to toggle inclusion of SQLDataModelindexcolumn for greater flexibility and package consistency to similar methods.
- Version 0.1.9 (2024-03-19):
New method.
- to_pickle(filename: str = None) None[source]
Save the
SQLDataModelinstance to the specifiedfilenameas a pickle object.- Parameters:
filename (str, optional) – The file name to save the model to. If None, the invoking Python file’s name with a “.sdm” extension will be used.
- Raises:
TypeError – If filename is provided but is not of type ‘str’ representing a valid pickle filepath.
- Returns:
None
Example:
import sqldatamodel as sdm headers = ['idx', 'first', 'last', 'age'] data = [ (0, 'john', 'smith', 27) ,(1, 'sarah', 'west', 29) ,(2, 'mike', 'harlin', 36) ,(3, 'pat', 'douglas', 42) ] # Create the SQLDataModel object df = sdm.SQLDataModel(data, headers) # Save the model's data as a pickle file "output.sdm" df.to_pickle("output.sdm") # Alternatively, leave blank to use the current file's name: df.to_pickle() # This way the same data can be recreated later by calling the from_pickle() method from the same project: df = sdm.from_pickle()
Note
All data, headers, data types and display properties will be saved when pickling.
If no
filenameargument is provided, then the invoking module’s__name__property will be used by default.
- to_polars(index: bool = False, include_headers: bool = True) _pl.DataFrame[source]
Converts
SQLDataModelto a PolarsDataFrameobject. Note that thepolarspackage must be installed to use this method.- Parameters:
index (bool, optional) – If True, includes the model index in the result. Default is False.
include_headers (bool, optional) – If True, includes column headers in the result. Default is True.
- Raises:
ModuleNotFoundError – If Polars is not installed.
- Returns:
The model’s data converted to a Polars DataFrame.
- Return type:
polars.DataFrame
Example:
import polars import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('Beth', 27, 172.4), ('John', 30, 175.3), ('Alice', 28, 162.0), ('Travis', 35, 185.8) ] # Create the model df_sdm = sdm.SQLDataModel(data, headers) # Convert the model to a polars df with the index df_pl = df_sdm.to_polars(index=True) # View result print(df_pl)
This will output:
shape: (4, 4) ┌─────┬────────┬─────┬────────┐ │ idx ┆ Name ┆ Age ┆ Height │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 ┆ f64 │ ╞═════╪════════╪═════╪════════╡ │ 0 ┆ Beth ┆ 27 ┆ 172.4 │ │ 1 ┆ John ┆ 30 ┆ 175.3 │ │ 2 ┆ Alice ┆ 28 ┆ 162.0 │ │ 3 ┆ Travis ┆ 35 ┆ 185.8 │ └─────┴────────┴─────┴────────┘
Note
See related
SQLDataModel.from_polars()for the inverse method of converting a PolarsDataFrameobject into to aSQLDataModel.SQLDataModel uses different data types than those used in
polars, seeSQLDataModel.set_column_dtypes()for more information about casting rules.Polars does not really have a concept of an index column, therefore when using
index=True, the SQLDataModel index is just an additional column in the returned DataFrame object.
- Changelog:
- Version 1.1.0 (2024-10-22):
Added
orient = 'row'argument to explicitly set data orientation when constructing dataframe.
- Version 0.3.8 (2024-04-12):
New method.
- to_pyarrow(index: bool = False) _pa.Table[source]
Returns the current
SQLDataModelin Apache Arrow columnar format as apyarrow.Table.- Parameters:
index (bool, optional) – Specifies whether to include the index of the SQLDataModel in the resulting Table. Default is to False.
- Raises:
ModuleNotFoundError – If the required package
pyarrowis not installed.Exception – If any unexpected exception occurs during the pyarrow conversion process.
- Returns:
A table representing the current
SQLDataModelin Apache Arrow columnar format.- Return type:
pyarrow.Table
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)] # Create the model df = sdm.SQLDataModel(data, headers) # Create the pyarrow table table = df.to_pyarrow() # View result print(table)
This will output the
pyarrowobject details:pyarrow.Table Name: string Age: int64 Grade: double ---- Name: [["Alice","Bob","Charlie"]] Age: [[25,30,35]] Grade: [[3.8,3.9,3.2]]
Note
Unmodified python types will follow conversion and casting rules specified in
pyarrowimplementation, for the modifieddateanddatetimetypes,date32[day]andtimestamp[us]will be used, respectively.
- Changelog:
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- Version 0.2.3 (2024-03-28):
New method.
- to_sql(table: str, con: Connection | Any, *, schema: str = None, if_exists: Literal['fail', 'replace', 'append'] = 'fail', index: bool = True, primary_key: str | int = None) None[source]
Insert the
SQLDataModelinto the specified table using the provided database connection.- Supported Connection APIs:
SQLite using
sqlite3or url with format'file:///path/to/database.db'PostgreSQL using
psycopg2or url with format'postgresql://user:pass@hostname:port/db'SQL Server ODBC using
pyodbcor url with format'mssql://user:pass@hostname:port/db'Oracle using
cx_Oracleor url with format'oracle://user:pass@hostname:port/db'Teradata using
teradatasqlor url with format'teradata://user:pass@hostname:port/db'
- Parameters:
table (str) – The name of the table where data will be inserted.
con (sqlite3.Connection | Any) – The database connection object or connection url. Supported connection APIs are
sqlite3,psycopg2,pyodbc,cx_Oracle,teradatasqlschema (str, optional) – The schema to use for PostgreSQL and ODBC SQL Server connections, ignored otherwise. Default is None.
if_exists (Literal['fail', 'replace', 'append'], optional) – Action to take if the table already exists. If
failan error is raised if table exists and no inserts occur. Ifreplaceany existing table is dropped prior to inserts. Ifappendexisting table is appended to by subsequent inserts.index (bool, optional) – If the model index should be included in the target table. Default is True.
primary_key (str | int, optional) – Column name or index to use as table primary key. Default is None, using the index column as the primary key when
index=True.
- Raises:
SQLProgrammingError – If an error occurs during cursor accessing, table creation or data insertion into the database.
ModuleNotFoundError – If
conis provided as a connection url and the specified scheme driver module is not found.ValueError – If specified
tablealready exists when usingif_exists='fail'or ifconis not one of the currently supported connection modules.IndexError – If
primary_keyis provided as anintrepresenting a column index but is out of range of the current modelSQLDataModel.column_count.TypeError – If
primary_keyargument provided is not of type ‘str’ or ‘int’ representing a valid column name or index to use as the primary key column for the target table.
- Returns:
None
Example:
import sqlite3 import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)] # Create the model df = sdm.SQLDataModel(data, headers) # Create connection object sqlite_db_conn = sqlite3.connect('students.db') # Basic usage, creating a new table df.to_sql('users', sqlite_db_conn)
This will create a new table
users, or fail if one already exists:sqlite> select * from users; idx Name Age Grade --- ------- --- ----- 0 Alice 25 3.8 1 Bob 30 3.9 2 Charlie 35 3.2 3 David 28 3.4
Connect to PostgreSQL, SQL Server, Oracle or Teradata:
import psycopg2 import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2), ('David', 28, 3.4)] # Create the model df = sdm.SQLDataModel(data, headers) # Setup the connection, whether using psycopg2 or other supported modules like pyodbc con = psycopg2.connect(...) # Create or replace existing table in database df.to_sql('users', con, if_exists='replace', index=False)
This will result in a new table
usersin our PostgreSQL database:=> select * from users; Name | Age | Grade | --------+-----+-------+ Alice | 25 | 3.8 | Bob | 30 | 3.9 | Charlie | 35 | 3.2 | David | 28 | 3.4 |
For SQL Server connections using
pyodbc, the example would be almost identical except for whichconobject we use:import pyodbc # For SQL Server ODBC connections using pyodbc con = pyodbc.connect(...)
The same is true for Oracle and other connections:
import cx_Oracle # For Oracle connections using cx_Oracle con = cx_Oracle.connect(...)
Using a Primary Key
import sqldatamodel as sdm # Sample data headers = ['ID', 'User'] data = [(1001, 'Alice'), (1002, 'Bob'), (1003, 'Charlie'), (1004, 'David')] # Create the model df = sdm.SQLDataModel(data, headers) # Create connection object sqlite_db_conn = sqlite3.connect('students.db') # Create the table using the 'ID' column as the primary key df.to_sql('users', sqlite_db_conn, if_exists='replace', index=False, primary_key='ID')
This will create a
userstable with the schema:sqlite> .schema users CREATE TABLE "users" ( "ID" INTEGER PRIMARY KEY, "User" TEXT);
With the
IDcolumn as its primary key:sqlite> select * from users; ID User ---- ------- 1001 Alice 1002 Bob 1003 Charlie 1004 David
If table creation is necessary, column types will be mapped according to the destination database by the following conversion:
┌─────────────────┬─────────┬─────────┬────────┬─────────┬────────────────┬──────┬───────────┐ │ Database \ Type │ NULL │ INTEGER │ REAL │ TEXT │ BLOB │ DATE │ TIMESTAMP │ ├─────────────────┼─────────┼─────────┼────────┼─────────┼────────────────┼──────┼───────────┤ │ PostgreSQL │ UNKNOWN │ INTEGER │ FLOAT │ TEXT │ BYTEA │ DATE │ TIMESTAMP │ │ SQL Server ODBC │ UNKNOWN │ INTEGER │ FLOAT │ TEXT │ VARBINARY(MAX) │ DATE │ DATETIME │ │ Oracle │ UNKNOWN │ NUMBER │ NUMBER │ VARCHAR │ BLOB │ DATE │ DATETIME │ │ Teradata │ UNKNOWN │ INTEGER │ FLOAT │ VARCHAR │ BYTE │ DATE │ DATETIME │ │ SQLite │ NULL │ INTEGER │ REAL │ TEXT │ BLOB │ DATE │ TIMESTAMP │ └─────────────────┴─────────┴─────────┴────────┴─────────┴────────────────┴──────┴───────────┘ [5 rows x 8 columns]
Note
When providing a
primary_keycolumn it will be assumed unique and the model will not perform any unique-ness constraints.When
conis provided as a string a connection will be attempted usingutils._create_connection()if the path does not exist, otherwise asqlite3local connection will be attempted.When
conis provided as an object a connection is assumed to be open and valid, if a cursor cannot be created from the object an exception will be raised.Connections with write access can be used in the
SQLDataModel.to_sql()method for writing to the same connection types, be careful.ValueError will be raised if
tablealready exists, useif_exists = 'replace'orif_exists = 'append'to instead replace or append to the table.See relevant module documentation for additional details or information pertaining to specific database or connection dialect being used.
See related
SQLDataModel.from_sql()for creatingSQLDataModelfrom existing SQL database connections.See utility methods
utils._parse_connection_url()andutils._create_connection()for implementation on creating database connections from urls.
- Changelog:
- Version 0.9.1 (2024-06-27):
Modified handling of
conparameter to allow database connection url to also be provided as'scheme://user:pass@host:port/db'
- Version 0.8.2 (2024-06-24):
Modified handling of
conparameter to allow providing SQLite database filepath directly as string to instantiate connection.
- Version 0.3.0 (2024-03-31):
Renamed arguments
extern_con:con,replace_existing:if_exists,include_index:index.Added
primary_keyargument for specifying a primary key column for table schema.Added
schemaargument for specifying a target schema for the table.
- to_string(index: bool = None, display_max_rows: int = None, display_max_width: int = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, vertical_ellipses: str = '⠒⠂', horizontal_ellipses: str = '⠤⠄', display_dimensions: bool = False, index_rep: str = None, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = None) str[source]
Generate a tabular representation of the model based on custom parameters and bounds.
- Parameters:
index (bool, optional) – Whether to include the index column in the output. Default is None, using
SQLDataModel.display_indexvalue.display_max_rows (int, optional) – Maximum number of rows to display. Default is None, using
SQLDataModel.display_max_rowsvalue.display_max_width (int, optional) – Maximum character width of the output table before horizontal truncation occurs. Default is None, generating a full width representation.
min_column_width (int, optional) – Minimum width of columns. Default is None, using
SQLDataModel.min_column_widthvalue.max_column_width (int, optional) – Maximum width of columns. Default is None, using
SQLDataModel.max_column_widthvalue with a floor value of 2.float_precision (int, optional) – Precision for displaying float values. Default is None, using
SQLDataModel.display_float_precisionvalue.vertical_ellipses (str, optional) – Characters to represent row truncation when vertical truncation is required. Default is
'⠒⠂'.horizontal_ellipses (str, optional) – Characters to represent column truncation when horizontal truncation is required. Default is
'⠤⠄'.display_dimensions (bool, optional) – Whether to display the dimensions of the table. Defaults to False.
index_rep (str, optional) – String representation for the index. Default is None, using a single whitespace character to represent the index column. Only used when generating index column, otherwise ignored when
index = False.column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – Alignment for columns. Default is None, using
SQLDataModel.column_alignmentvalue.table_style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple'], optional) – Table style. Default is None, using
SQLDataModel.table_stylevalue.
- Returns:
A string representing the tabular output of the model with the restrictions and styling applied.
- Return type:
str
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Gender', 'City'] data = [ ('Alice', 38, 'Female', 'Milwaukee'), ('Sarah', None, 'Female', 'Houston'), ('Michael', 42, 'Male', 'Atlanta'), ('John', None, 'Male', 'Boston'), ('Bobby', 25, 'Male', 'Chicago'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Generate a Markdown style representation using 'ID' to represent the index markdown_repr = df.to_string(table_style='markdown', index_rep='ID')
This will generate a ‘Markdown’ styled representation:
| ID | Name | Age | Gender | City | |----|---------|-----|--------|-----------| | 0 | Alice | 38 | Female | Milwaukee | | 1 | Sarah | | Female | Houston | | 2 | Michael | 42 | Male | Atlanta | | 3 | John | | Male | Boston | | 4 | Bobby | 25 | Male | Chicago |
Vertical and horizontal limits can also be applied:
# Set vertical and horizontal limits with custom styling truncated_repr = df.to_string( table_style='polars', display_max_rows=4, display_max_width=36, horizontal_ellipses='..', vertical_ellipses='...', index=False ) # View output print(truncated_repr)
This will output a vertically and horizontally truncated representation that fits within the bounds provided:
┌───────┬─────┬────┬───────────┐ │ Name ┆ Age ┆ .. ┆ City │ ╞═══════╪═════╪════╪═══════════╡ │ Alice ┆ 38 ┆ .. ┆ Milwaukee │ │ Sarah ┆ ┆ .. ┆ Houston │ │ ... ┆ ... ┆ .. ┆ ... │ │ John ┆ ┆ .. ┆ Boston │ │ Bobby ┆ 25 ┆ .. ┆ Chicago │ └───────┴─────┴────┴───────────┘
Note
Table styles reflect style similarity only, format specifc methods should be used for generating complete and valid output.
Vertical truncation characters are applied to column wide truncation and horizontal truncation characters are applied at row and cell level.
When a discrepancy exists between minimum and maximum column widths, conflict is resolved by setting max width equal to
max(min_column_width, max_column_width).See
SQLDataModel.to_text()for writing textual representation directly to ‘.txt’ files.See
SQLDataModel.set_table_style()for available style options and output examples.
- Changelog:
- Version 2.3.0 (2026-01-21):
Fixed issue where providing
float_precisionhad no actual impact on dispaly float precision used in output.
- Version 0.11.0 (2024-07-05):
New method.
- to_text(filename: str = None, index: bool = None, min_column_width: int = None, max_column_width: int = None, float_precision: int = None, horizontal_ellipses: str = '⠤⠄', index_rep: str = ' ', display_dimensions: bool = False, column_alignment: Literal['dynamic', 'left', 'center', 'right'] = None, table_style: Literal['ascii', 'bare', 'dash', 'default', 'double', 'latex', 'list', 'markdown', 'outline', 'pandas', 'polars', 'postgresql', 'round', 'rst-grid', 'rst-simple'] = None) str | None[source]
Returns a textual representation of the current
SQLDataModelas a string literal or by writing to file if afilenameis provided.- Parameters:
filename (str, optional) – The name of the file to write the text content. If provided, writes the text to the specified file. Default is None.
index (bool, optional) – Whether to include the index column in the text output. Default is value set on
SQLDataModel.display_index.min_column_width (int, optional) – The minimum column width for table cells. Default is value set on
SQLDataModel.min_column_width.max_column_width (int, optional) – The maximum column width for table cells. Default is value set on
SQLDataModel.max_column_width.float_precision (int, optional) – The precision for floating-point values. Default is value set on
SQLDataModel.display_float_precision.horizontal_ellipses (str, optional) – Characters to truncate column and cell values when horizontal truncation is required. Default is
'⠤⠄'.index_rep (str, optional) – String representation for the index. Default is None, using a single whitespace character to represent the index column. Only used when generating index column, otherwise ignored when
index = False.display_dimensions (bool, optional) – Whether to include the model dimensions
[N rows x N cols]in the text output. Default is False.column_alignment (Literal['dynamic', 'left', 'center', 'right'], optional) – Column alignment. Default is value at
SQLDataModel.column_alignment.'dynamic': Dynamically aligns column content, right for numeric types and left for remaining types.'left': Left-aligns all column content.'center': Center-aligns all column content preferring left on uneven splits.'right': Right-aligns all column content.table_style (Literal['ascii','bare','dash','default','double','latex','list','markdown','outline','pandas','polars','postgresql','round','rst-grid','rst-simple'], optional) – The table styling to use. Default is value set on
SQLDataModel.table_style.
- Raises:
TypeError – If arguments are provided but are not the correct types:
filename(str),index(bool),min_column_width(int),max_column_width(int),float_precision(int).ValueError – If the
column_alignmentargument is provided and is not one of ‘dynamic’, ‘left’, ‘center’, or ‘right’.Exception – If there is an OS related error encountered when opening or writing to the provided
filename.
- Returns:
If
filenameis None, returns the textual representation as a string. Iffilenameis provided, writes the textual representation to the specified file and returns None.- Return type:
strorNone
Examples:
Returning Text Literal
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data=data, headers=headers) # Generate text table literal text_table = df.to_text() # View output print(text_table)
This will output:
┌─────────┬──────┬────────┐ │ Name │ Age │ Height │ ├─────────┼──────┼────────┤ │ John │ 30 │ 175.3 │ │ Alice │ 28 │ 162.0 │ │ Michael │ 35 │ 185.8 │ └─────────┴──────┴────────┘
Write to File
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Height'] data = [ ('John', 30, 175.3), ('Alice', 28, 162.0), ('Michael', 35, 185.8) ] # Create the model df = sdm.SQLDataModel(data=data, headers=headers) # Write the output to the file, center-aligning all columns df.to_text(filename='Table.txt', column_alignment='center')
Contents of
Table.txt:┌───┬─────────┬──────┬────────┐ │ │ Name │ Age │ Height │ ├───┼─────────┼──────┼────────┤ │ 0 │ John │ 30 │ 175.3 │ │ 1 │ Alice │ 28 │ 162.0 │ │ 2 │ Michael │ 35 │ 185.8 │ └───┴─────────┴──────┴────────┘
Important
Unlike output from
print(df)or other calls toSQLDataModel.__repr__(), the output from this method includes the fullSQLDataModeland is not restricted by current terminal size or the value set atSQLDataModel.display_max_rows. As such, horizontal truncation only occurs on cell values as determined bymax_column_widthand no other horizontal or vertical table-wide truncation is performed.Note
If
filenameis provided, the method writes the text to the specified file; otherwise, it returns the textual representation as a string.If
indexisNone, the method uses the current value onSQLDataModel.display_index.If
min_column_widthisNone, the method uses the current value onSQLDataModel.min_column_width.If
max_column_widthisNone, the method uses the current value onSQLDataModel.max_column_width.If
float_precisionisNone, the method uses the current value onSQLDataModel.display_float_precision.If
column_alignmentisNone, the method uses the current value onSQLDataModel.column_alignment.If
table_styleisNone, the method uses the current value onSQLDataModel.table_style.See
SQLDataModel.set_table_style()for modifying table format and available styles.See
SQLDataModel.to_string()for greater flexibility and control over generated string representations.
- Changelog:
- Version 0.11.0 (2024-07-05):
Added
horizontal_ellipsesparameter to allow customizing truncation characters used when column or cell values exceed maximumn column widths.Added
index_repparameter to allow customizing index column name with prior behavior set as default representation. Ignored whenindex = False.Modified to use
SQLDataModel.to_string()instead of generating independently formatted repr for more consistency between tabular outputs.
- Version 0.10.4 (2024-07-03):
Modified to escape newline characters through
utils.sqlite_printf_format()to avoid wrapping table rows.
- Version 0.9.3 (2024-06-28):
Added additional options ‘rst-simple’ and ‘rst-grid’ for
table_styleparameter.
- Version 0.3.10 (2024-04-16):
Added
table_styleparameter and updated output to reflect new formatting styles introduced in version 0.3.9.Added
display_dimensionsparameter to allow toggling display of table dimensions in output.
- Version 0.3.0 (2024-03-31):
Renamed
include_indexparameter toindexfor package consistency.
- to_xml(filename: str | None = None, root_tag: str = 'data', row_tag: str = 'row', column_tag: str = 'column', value_tag: str = 'value', orient: Literal['rows', 'columns'] = 'rows', index: bool | None = None, encoding: str = 'utf-8', pretty: bool = True, xml_declaration: bool = True) str | None[source]
Converts the
SQLDataModelinstance to XML format. Iffilenameis specified, writes the XML to file; otherwise returns the XML string literal.- Parameters:
filename (str | None) – Output file path. If None, returns XML as string.
root_tag (str, optional) – Root element name. Default is ‘data’.
row_tag (str, optional) – Row element name. Default is ‘row’.
column_tag (str, optional) – column element name. Default is ‘column’.
value_tag (str, optional) – value element name. Default is ‘value’.
orient (Literal['rows','columns'], optional) – Orientation of the XML output. -
'rows'(default): Each row is serialized as a<row>element. -'columns': Each column is serialized as a<column>element containing one or more<value>elements.index (bool | None) – Whether to include index column. Defaults to display_index.
encoding (str, optional) – Output encoding. Default ‘utf-8’.
pretty (bool, optional) – Whether to pretty-print XML output. Default True.
xml_declaration (bool, optional) – Whether to include the XML declaration
<?xml version="1.0" encoding="utf-8"?>at the top of the output. Default isTrue.
- Raises:
TypeError – If
filenameis provided and is not of typestrrepresenting a filepath to write the XML data to.OSError – If an error is encountered when trying to access or write to the specified file.
- Returns:
If
filenameis None, returns the XML representation as a string. Iffilenameis provided, writes the XML representation to the specified file and returns None.- Return type:
str
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)] # Create the model and generate the XML data df = sdm.SQLDataModel(data, headers) xml_data = df.to_xml(index=False) # View the resulting XML literal print(xml_data)
This will output the XML representation of our sample data:
<data> <row> <Name>Alice</Name> <Age>25</Age> <Grade>3.8</Grade> </row> <row> <Name>Bob</Name> <Age>30</Age> <Grade>3.9</Grade> </row> <row> <Name>Charlie</Name> <Age>35</Age> <Grade>3.2</Grade> </row> </data>
Orient by columns:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Grade'] data = [('Alice', 25, 3.8), ('Bob', 30, 3.9), ('Charlie', 35, 3.2)] # Create the model df = sdm.SQLDataModel(data, headers) # Alternatively, we can orient the XML data by columns instead of rows xml_by_cols = df.to_xml(orient='columns', index=False) # View the resulting XML literal print(xml_data)
This will output the data in columnar orientation:
<data> <column name="Name"> <value>Alice</value> <value>Bob</value> <value>Charlie</value> </column> <column name="Age"> <value>25</value> <value>30</value> <value>35</value> </column> <column name="Grade"> <value>3.8</value> <value>3.9</value> <value>3.2</value> </column> </data>
Note
Columns with names that are not valid XML tags are serialized using a
<col>element with the original name stored in anameattribute for round-trip safety.The XML declaration can be excluded by setting
xml_declaration=False, which is useful when embedding the output as an XML fragment inside a larger document.When
orient='columns'is used, the output is fully compatible withSQLDataModel.from_xml(orient='columns')()for lossless round-trip conversion.
- Changelog:
- Version 2.3.1 (2026-01-22):
New method.
- transpose(infer_types: bool = True, include_headers: bool = False) SQLDataModel[source]
Transposes the model and returns as a new
SQLDataModel.- Parameters:
infer_types (bool, optional) – If types should be inferred after the transposition. Defaults to True.
include_headers (bool, optional) – If headers are included in the transposed data. Defaults to False.
- Returns:
The transposition of the model as a new SQLDataModel instance.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Create the model df = sdm.SQLDataModel([('A1', 'A2'), ('B1', 'B2'), ('C1', 'C2')]) # Transpose it df_transposed = df.transpose() # View original print(f"Original:\n{df}") # Along with transposed print(f"Transposed:\n{df_transposed}")
This will output the result of the transposition:
Original: ┌───┬─────┬─────┐ │ │ 0 │ 1 │ ├───┼─────┼─────┤ │ 0 │ A1 │ A2 │ │ 1 │ B1 │ B2 │ │ 2 │ C1 │ C2 │ └───┴─────┴─────┘ [3 rows x 2 columns] Transposed: ┌───┬─────┬─────┬─────┐ │ │ 0 │ 1 │ 2 │ ├───┼─────┼─────┼─────┤ │ 0 │ A1 │ B1 │ C1 │ │ 1 │ A2 │ B2 │ C2 │ └───┴─────┴─────┴─────┘ [2 rows x 3 columns]
Note
When
infer_types=False, the first row of the transposed result will be used to set thedtypesof the new model. This is generally a poor choice considering the nature of transposing data.If
include_headers=True, the headers will be included as the first row in the transposed data.Running this method sequentially should return the original model,
sdm == sdm.transpose().transpose()
- Changelog:
- Version 0.3.5 (2024-04-08):
New method.
- unique(ignore_index: bool = True) SQLDataModel[source]
Returns a new model using the unique values of the current model, keeping the first by order of appearance.
- Parameters:
ignore_index (bool, optional) – If True, the original index of the unique values is ignored. If False, the original index is kept. Default is True.
- Returns:
A new model consisting of the unique values contained in the original model.
- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data data = [ ('Bob', 'Chicago'), ('Bob', 'Chicago'), ('Bob', 'Chicago'), ('Alice', 'New York'), ('Alice', 'New York'), ('Charles', 'Boston') ] # Create the model df = sdm.SQLDataModel(data, headers=['Name', 'City']) # Create a new model from only unique rows df_unique = df.unique() # View it print(df_unique)
This will output the first unique rows, ignoring the original indicies:
┌───┬─────────┬──────────┐ │ │ Name │ City │ ├───┼─────────┼──────────┤ │ 0 │ Bob │ Chicago │ │ 1 │ Alice │ New York │ │ 2 │ Charles │ Boston │ └───┴─────────┴──────────┘ [3 rows x 2 columns]
Alternatively, the original index for each unique row can be retained
# Do not ignore the indicies df_unique_with_idx = df.unique(ignore_index=False) # View it print(df_unique_with_idx)
This will output a similar result, but the original indicies from the rows kept is retained:
┌───┬─────────┬──────────┐ │ │ Name │ City │ ├───┼─────────┼──────────┤ │ 0 │ Bob │ Chicago │ │ 3 │ Alice │ New York │ │ 5 │ Charles │ Boston │ └───┴─────────┴──────────┘ [3 rows x 2 columns]
This method is particularly useful when needing to extract subsets of data, for example:
# Sample data headers = ['Name', 'Age', 'Department'] data = [ ('Alice', 38, 'HR'), ('Carol', 37, 'HR'), ('Billy', 23, 'Marketing'), ('Nate', 28, 'Sales'), ('Jill', 27, 'Sales'), ('John', 31, 'Engineering'), ('Kyle', 32, 'Engineering'), ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter rows by 'Age' and return unique 'Department' values under_30_depts = df[df['Age'] < 30, 'Department'].unique() # View it print(under_30_depts)
This will output the unique ‘Department’ values for those rows matching the ‘Age’ filter:
┌───┬────────────┐ │ │ Department │ ├───┼────────────┤ │ 0 │ Marketing │ │ 1 │ Sales │ └───┴────────────┘ [2 rows x 1 columns]
Note
Null values are considered a unique value by SQLite and are treated accordingly when present.
See
SQLDataModel.deduplicate()to drop duplicates or modify the model in place based on duplicate values.See
SQLDataModel.count_unique()to generate counts of unique values by column.See
SQLDataModel.fillna()to fill missing or null values in the model if needed.
- Changelog:
- Version 1.3.0 (2025-02-09):
New method.
- update_index_at(row_index: int, column_index: int | str, value: Any = None) None[source]
Updates a specific cell in the
SQLDataModelat the given row and column indices with the provided value.- Parameters:
row_index (int) – The index of the row to be updated.
column_index (int or str) – The index or name of the column to be updated.
value (Any, optional) – The new value to be assigned to the specified cell.
- Raises:
TypeError – If
row_indexis not of type ‘int’ or ifcolumn_indexis not of type ‘int’ or ‘str’.IndexError – If row or column provided as an ‘int’ but is outside of the current model row or column range.
ValueError – If column provided as a ‘str’ but is not found in the current model headers.
SQLProgrammingError – If there is an issue with the SQL execution during the update.
- Returns:
None
Example:
import sqldatamodel as sdm # Create an initial 3x3 model filled with dashes df = sdm.from_shape((3,3), fill='---', headers=['A', 'B', 'C']) # Update cell based on integer indicies df.update_index_at(0, 0, 'Top Left') df.update_index_at(0, 2, 'Top Right') # Update cell based on row index and column name df.update_index_at(2, 'A', 'Bottom Left') df.update_index_at(2, 'C', 'Bottom Right') # Update based on negative row and column indexing df.update_index_at(-2, -2, 'Center') # View result print(df)
This will output cumulative result of our updates:
┌───┬─────────────┬────────┬──────────────┐ │ │ A │ B │ C │ ├───┼─────────────┼────────┼──────────────┤ │ 0 │ Top Left │ --- │ Top Right │ │ 1 │ --- │ Center │ --- │ │ 2 │ Bottom Left │ --- │ Bottom Right │ └───┴─────────────┴────────┴──────────────┘ [3 rows x 3 columns]
Important
Indexing is done using zero-based integers and not done by index value. Most of the time this distinction is irrelevant as the row index at position ‘0’ will have an index value of ‘0’, however this can change after transformation operations like filter or sort. To reset and realign the index value use
SQLDataModel.reset_index()or useSQLDataModel.indiciesto view the current row indicies.Note
This method only updates individual cells in the current model based on integer indexing for both rows and columns using their (row, column) position.
To broadcast updates across row and column dimensions use the syntax of
sdm[row, column] = valueor seeSQLDataModel.__setitem__()for more details.
- Changelog:
- Version 0.8.0 (2024-06-21):
Modified to allow
row_indexandcolumn_indexarguments the same input type flexibility found across package, allowing both to be referenced directly or by their integer index.
- Version 0.5.2 (2024-05-13):
Modified
row_indexparameter to useSQLDataModel.indiciesto index into rows in lieu of row index value equality.
- Version 0.1.9 (2024-03-19):
New method.
- vstack(*other: SQLDataModel, inplace: bool = False) SQLDataModel[source]
Vertically stacks one or more
SQLDataModelobjects to the current model.- Parameters:
other (SQLDataModel or sequence of) – The SQLDataModel objects to vertically stack.
inplace (bool, optional) – If True, performs the vertical stacking in-place, modifying the current model. Defaults to False, returning a new
SQLDataModel.
- Returns:
The vertically stacked SQLDataModel instance when inplace is False.
- Return type:
SQLDataModel- Raises:
ValueError – If no additional SQLDataModels are provided for vertical stacking.
TypeError – If any argument in ‘other’ is not of type SQLDataModel, list, or tuple.
SQLProgrammingError – If an error occurs when updating the model values in place.
Example:
import sqldatamodel as sdm # Create models A and B df_a = sdm.SQLDataModel([('A', 1), ('B', 2)], headers=['A1', 'A2']) df_b = sdm.SQLDataModel([('C', 3), ('D', 4)], headers=['B1', 'B2']) # Vertically stack B onto A df_ab = df_a.vstack(df_b) # View stacked model print(df_ab)
This will output the result of stacking B onto A, using the base model columns and dtypes:
┌─────┬─────┐ │ A1 │ A2 │ ├─────┼─────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ │ D │ 4 │ └─────┴─────┘ [4 rows x 2 columns]
Multiple models can be stacked simultaneously, here we vertically stack 3 models:
# Create a third model C df_c = sdm.SQLDataModel([('E', 5), ('F', 6)], headers=['C1', 'C2']) # Vertically stack all three models df_abc = df_a.vstack([df_b, df_c]) # View stacked result print(df_abc)
This will output the result of stacking C and B onto A:
┌─────┬─────┐ │ A1 │ A2 │ ├─────┼─────┤ │ A │ 1 │ │ B │ 2 │ │ C │ 3 │ │ D │ 4 │ │ E │ 5 │ │ F │ 6 │ └─────┴─────┘ [6 rows x 2 columns]
Note
Headers and data types are inherited from the model calling the
SQLDataModel.vstack()method, casting stacked values corresponding to the base model types.Model dimensions will be truncated or padded to coerce compatible dimensions when stacking, use
SQLDataModel.concat()for strict concatenation instead of vstack.See
SQLDataModel.insert_row()for inserting new values or types other thanSQLDataModeldirectly into the current model.See
SQLDataModel.hstack()for horizontal stacking.
- Changelog:
- Version 0.3.4 (2024-04-05):
New method.
- where(predicate: str) SQLDataModel[source]
Filters the rows of the current
SQLDataModelobject based on the specified SQL predicate and returns a newSQLDataModelcontaining only the rows that satisfy the condition. Only the predicates are needed as the statement prepends the select clause as “select [current model columns] where [predicate]”, see below for detailed examples.- Parameters:
predicate (str) – The SQL predicate used for filtering rows that follows the ‘where’ keyword in a normal SQL statement.
- Raises:
TypeError – If the provided
predicateargument is not of typestr.SQLProgrammingError – If the provided string is invalid or malformed SQL when executed against the model
- Returns:
A new
SQLDataModelcontaining rows that satisfy the specified predicate.- Return type:
SQLDataModel
Example:
import sqldatamodel as sdm # Sample data headers = ['Name', 'Age', 'Job'] data = [ ('Billy', 30, 'Barber'), ('Alice', 28, 'Doctor'), ('John', 25, 'Technician'), ('Travis', 35, 'Musician'), ('William', 15, 'Student') ] # Create the model df = sdm.SQLDataModel(data, headers) # Filter model by 'Age' > 30 df_filtered = df.where('Age > 20') # View result print(df_filtered)
This will output:
┌───┬────────┬──────┬────────────┐ │ │ Name │ Age │ Job │ ├───┼────────┼──────┼────────────┤ │ 0 │ Billy │ 30 │ Barber │ │ 1 │ Alice │ 28 │ Doctor │ │ 2 │ John │ 25 │ Technician │ │ 3 │ Travis │ 35 │ Musician │ └───┴────────┴──────┴────────────┘ [4 rows x 3 columns]
Filter by multiple parameters:
# Filter by 'Job' and 'Age' df_filtered = df.where("Job = 'Student' and Age < 18") # View result print(df_filtered)
This will output:
┌───┬─────────┬──────┬─────────┐ │ │ Name │ Age │ Job │ ├───┼─────────┼──────┼─────────┤ │ 4 │ William │ 15 │ Student │ └───┴─────────┴──────┴─────────┘ [1 rows x 3 columns]
Note
predicatecan be any valid SQL, for example ordering can be acheived without any filtering by simple using the argument'(1=1) order by "age" asc'If
predicateis not of typestr, aTypeErroris raised, if it is not valid SQL,SQLProgrammingErrorwill be raised.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
ANSIColor
- class sqldatamodel.ansicolor.ANSIColor(text_color: str | tuple = None, text_bold: bool = False)[source]
Bases:
objectCreates an ANSI style terminal color using provided hex color or rgb values.
- Variables:
text_color (str or tuple) – Hex color code or RGB tuple.
text_bold (bool) – Whether text should be bold.
- Raises:
ValueError – If provided string is not a valid hex color code or if provided rgb tuple is invalid.
TypeError – If provided text_color or text_bold parameters are of invalid types.
Example:
from ANSIColor import ANSIColor # Create a pen by specifying a color in hex or rgb: green_bold = ANSIColor("#00ff00", text_bold=True) # Create a string to use as a sample: regular_str = "Hello, World!" # Color the string using the wrap method: green_str = green_bold.wrap(regular_str) # Print the string in the terminal to see the color applied: print(f"original string: {regular_str}, green string: {green_str}") # Get rgb values from existing color print(green_bold.to_rgb()) # Output: (0, 255, 0)
- Changelog:
- Version 0.10.2 (2024-06-30):
Added random color selection when initialized without a
text_colorargument.Added dictionary of color values at
ANSIColor.Colorsto use as selection pool.Modified
ANSIColor.__repr__()to always return hex value as a string for consistency regardless of original input format.
- __init__(text_color: str | tuple = None, text_bold: bool = False) None[source]
Initializes the
ANSIColorobject with the specified text color and bold setting, referred to as the ‘pen’ throughout documentation.- Parameters:
text_color (str or tuple) – Hex color code or RGB tuple. If not provided, a random color will be selected.
text_bold (bool) – Whether text should be bold (default: False)
Example:
from ANSIColor import ANSIColor # Initialize from hex value with normal weight color = ANSIColor("#00ff00") # Initialize from rgb value with bold weight color = ANSIColor((0,255,0), text_bold=True) # Surprise me! Initialize pen from random color color = ANSIColor()
- Changelog:
- Version 0.10.2 (2024-06-30):
Modified to randomly select a color from
ANSIColor.Colorswhentext_color = Nonefor demonstration purposes.
Note
The string returned by
ANSIColor.__repr__()will always return the hex value of the pen regardless of thetext_colorformat.See
ANSIColor.text_color_strto view originally provided format oftext_color.See
ANSIColor.text_color_rgbto view the RGB tuple equivalent oftext_color.See
ANSIColor.text_color_hexto view the hex value equivalent oftext_color.
- __repr__() str[source]
The string representation used for instances of
ANSIColordisplayed with the pen set atANSIColor.text_color_strformatted to allow object recreation.- Returns:
The string representation as
ANSIColor('hexvalue')colored with the ANSI terminal color- Return type:
str
Example:
from ANSIColor import ANSIColor # Create the pen from a hex value color = ANSIColor('#EFAC65') # View representation print(color)
This will output:
ANSIColor('#EFAC65')Creating a pen using the equivalent RGB tuple results in the same output:
# From the RGB equivalent values color = ANSIColor((239, 172, 101)) # View representation print(color)
This will also output:
ANSIColor('#EFAC65')Note
The representation will always be formatted using the hex value for consistency and recreation.
Use
ANSIColor.to_rgb()to view the RGB values for an existing pen.
- classmethod rand_color() ANSIColor[source]
Create a new ANSIColor pen by randomly selecting one from a preexisting pool of options.
- Returns:
A new ANSIColor instance created using a randomly selected color.
- Return type:
ANSIColor
Example:
from ANSIColor import ANSIColor # Surprise me! rand_color = ANSIColor.rand_color() # See what we got print(rand_color)
We got a nice orance color with this hex value:
ANSIColor('#F89F1F')Note
See
ANSIColor.Colorsfor dictionary of values being used as random color selection pool.
- Changelog:
- Version 0.10.2 (2024-06-30):
Added to allow a random color to be selected for
sqldatamodel.SQLDataModel.set_display_color()New method.
- text_color_hex[source]
The hex value of color uppercased and prepended with ‘#’ to reflect hexadecimal format ranging from
'#000000'to'#FFFFFF'.- Type:
str
- text_color_rgb[source]
The RGB value of the color as a tuple of integers reflecting the (red, green, blue) values satisfying
0 <= value <= 255.- Type:
tuple[int, int, int]
- text_color_str[source]
The input color used to create the pen in the originally provided format.
- Type:
str
- to_rgb() tuple[source]
Returns the text color attribute as a tuple in the format (r, g, b).
- Returns:
RGB tuple.
- Return type:
tuple
Example:
from ANSIColor import ANSIColor # Create the color color = ANSIColor("#00ff00") # Get the rgb values print(color.to_rgb()) # Output: (0, 255, 0)
- wrap(text: str) str[source]
Wraps the provided text in the style of the pen.
- Parameters:
text (str) – Text to be wrapped.
- Returns:
Wrapped text with ANSI escape codes.
- Return type:
str
Example:
from ANSIColor import ANSIColor # Create the color blue_color = ANSIColor("#0000ff") # Create a sample string message = "This string is currently unstyled" # Wrap the string to change its styling whenever its printed blue_message = blue_color.wrap(message) # Print the styled message print(blue_message) # Or style string or string object directly in the print statement print(blue_color.wrap("I'm going to turn blue!"))
HTMLParser
- class sqldatamodel.htmlparser.HTMLParser(*, convert_charrefs: bool = True, cell_sep: str = ' ', table_identifier: int | str = 1)[source]
Bases:
HTMLParserAn HTML parser designed to extract tables from HTML content.
This parser subclasses HTMLParser from the standard library to parse HTML content. It extracts tables from the HTML and provides methods to access the table data.
- Variables:
convert_charrefs (bool) – Flag indicating whether to convert character references to Unicode characters. Default is True.
cell_sep (str) – Separator string to separate cells within a row. Default is an empty string.
table_identifier (int or str) – Identifier used to locate the target table. It can be either an integer representing the table index, or a string representing the HTML ‘name’ or ‘id’ attribute of the table.
_in_td (bool) – Internal flag indicating whether the parser is currently inside a <td> tag.
_in_th (bool) – Internal flag indicating whether the parser is currently inside a <th> tag.
_current_table (list) – List to hold the current table being parsed.
_current_row (list) – List to hold the current row being parsed.
_current_cell (list) – List to hold the current cell being parsed.
_ignore_next (bool) – Internal flag indicating whether the next token should be ignored.
found_target (bool) – Flag indicating whether the target table has been found.
_is_finished (bool) – Internal flag indicating whether parsing is finished.
table_counter (int) – Counter to keep track of the number of tables encountered during parsing.
target_table (list) – List to hold the data of the target table once found.
- Change Log:
- Version 0.9.0 (2024-06-26):
Modified integer indexing of table elements found to use one-based indexing instead of zero-based indexing to align with similar method usage across package.
- handle_data(data: str) None[source]
Handle the data within an HTML tag during parsing.
- Parameters:
data (str) – The data contained within the HTML tag.
- handle_endtag(tag: str) None[source]
Handle the end of an HTML tag during parsing and modify the parsing tags accordingly.
- Parameters:
tag (str) – The name of the HTML tag encountered.
- handle_starttag(tag: str, attrs: list[str]) None[source]
Handle the start of an HTML tag during parsing.
- Parameters:
tag (str) – The name of the HTML tag encountered.
attrs (list[str]) – A list of (name, value) pairs representing the attributes of the tag.
- validate_table() None[source]
Validate and retrieve the target HTML table data based on
table_identifierused for parsing.- Returns:
A tuple containing the table data and headers (if present).
- Return type:
tuple[list, list|None]- Raises:
ValueError – If the target table is not found or cannot be parsed.
Note
SQLDataModel.from_html()uses this class to extract valid HTML tables from either web or file content.If a row is found with mismatched dimensions, it will be filled with
Nonevalues to ensure tabular output.
JSONEncoder
- class sqldatamodel.jsonencoder.DataTypesEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Bases:
JSONEncoderCustom JSON encoder that extends the functionality of
json.JSONEncoderto handle additional data types.- Serialization:
datetime.date: Serialized as a string in the format ‘YYYY-MM-DD’.datetime.datetime: Serialized as a string in the format ‘YYYY-MM-DD HH:MM:SS’.bytes: Decoded to a UTF-8 encoded string.
Note
The date and datetime types can be recovered using
SQLDataModel.infer_dtypes()method.The bytes information is not decoded back into bytes.
StandardDeviation
- class sqldatamodel.standarddeviation.StandardDeviation[source]
Bases:
objectImplementation of standard deviation as an aggregate function for SQLite:
\[\sigma = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}\]- Where:
\(x_i\) represents each individual data point in the population.
\(\mu\) is the population mean.
\(N\) is the total number of data points in the population.
This class provides methods to calculate the standard deviation of a set of values in an SQLite query using the aggregate function mechanism.
- Variables:
M (float) – The running mean of the values.
S (float) – The running sum of the squared differences from the mean.
k (int) – The count of non-null input values.
Note
See
SQLDataModel.describe()for statistical implementation.
- finalize() float[source]
Compute the final standard deviation as part of
sqlite3user-defined aggregate function.- Returns:
The computed standard deviation if the count is greater than or equal to 3, else None.
- Return type:
floatorNone
Note
This returns the population standard deviation, not sample standard deviation. It measures of the spread or dispersion of a set of data points within the population, using the entire population.
utils
- sqldatamodel.utils._create_connection(url: str) Connection | Any[source]
Parses database connection url into component parameters and creates the specified connection.
- Parameters:
url (str) – The url connection string provided in the format of
'scheme://user:pass@host:port/path'- Raises:
ValueError – If scheme is provided and not one of the currently supported driver formats.
ModuleNotFoundError – If required driver for specified scheme is not installed or not found.
- Returns:
The driver connection object for the scheme specified.
- Return type:
Connection(sqlite3.Connection | Any)
- Supported Formats:
SQLite using
sqlite3with format'file:///path/to/database.db'PostgreSQL using
psycopg2with format'postgresql://user:pass@hostname:port/db'SQL Server ODBC using
pyodbcwith format'mssql://user:pass@hostname:port/db'Oracle using
cx_Oraclewith format'oracle://user:pass@hostname:port/db'Teradata using
teradatasqlwith format'teradata://user:pass@hostname:port/db'
Examples:
SQLite
import sqldatamodel as sdm # SQLite connection url url = 'file:///home/database/users.db' # Parse and create sqlite3 connection conn = sdm.SQLDataModel._create_connection(url)
PostgreSQL
import sqldatamodel as sdm # Sample url url = 'postgresql://scott:tiger@12.34.56.78:5432/pgdb' # Parse and create psycopg2 connection conn = sdm.SQLDataModel._create_connection(url)
Note
Used by
SQLDataModel.from_sql()andSQLDataModel.to_sql()to parse and create connection objects from url.See
SQLDataModel._parse_connection_url()for implementation on parsing url properties from connection string.
- Changelog:
- Version 0.9.2 (2024-06-27):
New method.
- sqldatamodel.utils._parse_connection_url(url: str) NamedTuple[source]
Parses database connection url into component parameters and returns the parsed components as a NamedTuple
- Parameters:
url (str) – The url connection string provided in the format of
'scheme://user:pass@host:port/path'- Raises:
AttributeError – If
urlprovided could not be parsed into expected component properties.ValueError – If scheme is not provided or is not one of the currently supported driver formats or module aliases below SQLite:
'file'or'sqlite3'PostgreSQL:'postgresql'or'psycopg2'SQL Server ODBC:'mssql'or'pyodbc'Oracle:'oracle'or'cx_oracle'Teradata:'teradata'or'teradatasql'
- Returns:
The parsed details as
ConnectionDetails('scheme', 'user', 'cred', 'host', 'port', 'db')- Return type:
ConnectionDetails
- Supported Formats:
SQLite using
sqlite3with format'file:///path/to/database.db'PostgreSQL using
psycopg2with format'postgresql://user:pass@hostname:port/db'SQL Server ODBC using
pyodbcwith format'mssql://user:pass@hostname:port/db'Oracle using
cx_Oraclewith format'oracle://user:pass@hostname:port/db'Teradata using
teradatasqlwith format'teradata://user:pass@hostname:port/db'
Example:
import sqldatamodel as sdm # SQLite connection url url = 'file:///home/database/users.db' # Parse the connection properties url_props = sdm.SQLDataModel._parse_connection_url(url) # View attributes print(url_props)
This will output the connection details for a local SQLite database file:
ConnectionDetails( scheme='file', user=None, cred=None, host=None, port=None, db='/home/database/users.db' )PostgreSQL connections can be parsed from a valid format:
import sqldatamodel as sdm # PostgreSQL connection url url = 'postgresql://scott:tiger@12.34.56.78:5432/pgdb' # Parse the connection properties url_props = sdm.SQLDataModel._parse_connection_url(url) # View attributes print(url_props)
This will output the connection details for a PostgreSQL connection:
ConnectionDetails( scheme='postgresql', user='scott', cred='tiger', host='12.34.56.78', port=5432, db='pgdb' )Note
This method is used by
SQLDataModel._create_connection()to parse details from url and create a connection object.This method can be used by
SQLDataModel.from_sql()andSQLDataModel.to_sql()to parsed connection details when connection parameter provided as string.
- Changelog:
- Version 0.9.3 (2024-06-28):
Modified behavior when
schemeis not provided, treating as file path when parsed in absence of auth related properties to retain prior version behavior of creating new sqlite3 database file when path is provided.Added driver module names as valid aliases for relevant connection drivers, valid schemes now include ‘file’, ‘sqlite3’, ‘postgresql’, ‘psycopg2’, ‘mssql’, ‘pyodbc’, ‘oracle’, ‘cx_oracle’, ‘teradata’, ‘teradatasql’
- Version 0.9.2 (2024-06-27):
Modified to use
urllib.parse.urlparseinstead of added 3rd party package dependency.
- Version 0.9.1 (2024-06-27):
New method.
- sqldatamodel.utils.alias_duplicates(headers: list) Generator[source]
Rename duplicate column names in a given list by appending an underscore and a numerical suffix.
- Parameters:
headers (list) – A list of column names that require parsing for duplicates.
- Yields:
Generator– A generator object that yields the original or modified column names.
Example:
import sqldatamodel as sdm # Original list of column names with duplicates original_headers = ['ID', 'Name', 'Amount', 'Name', 'Date', 'Amount'] # Use the static method to return a generator for the duplicates renamed_generator = sdm.SQLDataModel.alias_duplicates(original_headers) # Obtain the modified column names modified_headers = list(renamed_generator) # View modified column names print(modified_headers) # Output modified_headers = ['ID', 'Name', 'Amount', 'Name_2', 'Date', 'Amount_2']
Example of implementation for SQLDataModel:
# Given a list of headers original_headers = ['ID', 'ID', 'Name', 'Name', 'Name', 'Unique'] # Create a separate list for aliasing duplicates aliased_headers = list(SQLDataModel.alias_duplicates(original_headers)) # View aliases for col, alias in zip(original_headers, aliased_headers): print(f"{col} as {alias}")
This will output:
ID as ID ID as ID_2 Name as Name Name as Name_2 Name as Name_3 Unique as Unique
Note
Used by
SQLDataModel.execute_fetch()when column selection is unknown and may require duplicate aliasing.
- Changelog:
- Version 0.3.4 (2024-04-05):
Modified to re-alias partially aliased input to prevent runaway incrementation on suffixes.
- Version 0.1.9 (2024-03-19):
New method.
- sqldatamodel.utils.flatten_json(json_source: list | dict, flatten_rows: bool = True, level_sep: str = '_', key_prefix: str = None) dict[source]
Parses raw JSON data and flattens it into a dictionary with optional normalization.
- Parameters:
json_source (dict | list) – The raw JSON data to be parsed.
flatten_rows (bool) – If True, the data will be normalized into columns and rows. If False, columns will be concatenated from each row using the specified key_prefix.
level_sep (str) – Separates nested levels from other levels and used to concatenate prefix to column.
key_prefix (str) – The prefix to prepend to the JSON keys. If None, an empty string is used.
- Returns:
A flattened dictionary representing the parsed JSON data.
- Return type:
dict
Example:
import sqldatamodel as sdm # Sample JSON json_source = [{ "alpha": "A", "value": 1 }, { "alpha": "B", "value": 2 }, { "alpha": "C", "value": 3 }] # Flatten JSON with normalization flattened_data = sdm.SQLDataModel.flatten_json(json_data, flatten_rows=True) # Format of result flattened_data = {"alpha": ['A','B','C'], "value": [1, 2, 3]} # Alternatively, flatten columns without rows and adding a prefix flattened_data = sdm.SQLDataModel.flatten_json(raw_input,key_prefix='row_',flatten_rows=False) # Format of result flattened_data = {'row_0_alpha': 'A', 'row_0_value': 1, 'row_1_alpha': 'B', 'row_1_value': 2, 'row_2_alpha': 'C', 'row_2_value': 3}
Note
Used by
SQLDataModel.from_dict()to flatten deeply nested JSON objects into 2 dimensions when encountered.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- sqldatamodel.utils.generate_html_table_chunks(html_source: str) Generator[str, None, None][source]
Generate chunks of HTML content for all
<table>elements found in provided source as complete and unbroken chunks for parsing.- Parameters:
html_source (str) – The raw HTML content from which to generate chunks.
- Raises:
ValueError – If zero
<table>elements were found inhtml_sourceprovided.- Yields:
str– Chunks of HTML content containing complete<table>elements.
Example:
import sqldatamodel as sdm # HTML content to chunk html_source = ''' <html> <table><tr><td>Table 1</td></tr></table> ... <p>Non-table elements</p> ... <table><tr><td>Table 2</td></tr></table> </html> ''' # Generate and view the returned table chunks for chunk in sdm.SQLDataModel.generate_html_table_chunks(html_source): print('Chunk:', chunk)
This will output:
Chunk: <table><tr><td>Table 1</td></tr></table> Chunk: <table><tr><td>Table 2</td></tr></table>
Note
HTML content before the first
<table>element and after the last</table>element is ignored and not yielded.See
SQLDataModel.from_html()for full implementation and how this function is used for HTML parsing.
- Changelog:
- Version 0.2.1 (2024-03-24):
New method.
- sqldatamodel.utils.infer_str_type(obj: str, date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') str[source]
Infer the data type of the input object.
- Parameters:
obj (str) – The object for which the data type is to be inferred.
date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.
datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.
- Returns:
The inferred data type.
- Return type:
str
- Inference:
'str': If the input object is a string, or cannot be parsed as another data type.'date': If the input object represents a date without time information.'datetime': If the input object represents a datetime with both date and time information.'int': If the input object represents an integer.'float': If the input object represents a floating-point number.'bool': If the input object represents a boolean value.'bytes': If the input object represents a binary array.'None': If the input object is None, empty, or not a string.
Note
This method attempts to infer the data type of the input object by evaluating its content.
If the input object is a string, it is parsed to determine whether it represents a date, datetime, integer, or float.
If the input object is not a string or cannot be parsed, its type is determined based on its Python type (bool, int, float, bytes, or None).
- Changelog:
- Version 2.3.2 (2026-01-23):
Modified check for possible to bytes to include lower cased prefix format of x’<BYTES>’
- Version 2.3.0 (2026-01-21):
Added additional check for possible bytes data if obj matches format of `X’<BYTES>’ where bytes are valid hexadecimal format.
- Version 0.1.9 (2024-03-19):
New method.
- sqldatamodel.utils.infer_types_from_data(input_data: list[list], date_format: str = '%Y-%m-%d', datetime_format: str = '%Y-%m-%d %H:%M:%S') list[str][source]
Infer the best types of
input_databy using a simple presence-based voting scheme. Sampling is assumed prior to function call, treatinginput_dataas already a sampled subset from the original data.- Parameters:
input_data (list[list]) – A list of lists containing the input data.
date_format (str) – The format string to use for parsing date values. Default is ‘%Y-%m-%d’.
datetime_format (str) – The format string to use for parsing datetime values. Default is ‘%Y-%m-%d %H:%M:%S’.
- Returns:
A list representing the best-matching inferred types for each column based on the sampled data.
- Return type:
list
Note
If multiple types are present in the samples, the most appropriate type is inferred based on certain rules.
If a column contains both
dateanddatetimeinstances, the type is inferred asdatetime.If a column contains both
intandfloatinstances, the type is inferred asfloat.If a column contains only
strinstances or multiple types with no clear choice, the type remains asstr.See
SQLDataModel.infer_str_type()for type determination process.
- Changelog:
- Version 0.1.9 (2024-03-19):
New method.
- sqldatamodel.utils.sqlite_cast_type_format(param: str = '?', dtype: Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool'] = 'str', as_binding: bool = True, as_alias: bool = False)[source]
Formats the specified param to be cast consistently into the python type specified for insert params or as a named alias param.
- Parameters:
param (str) – The parameter to be formatted.
dtype (Literal['None', 'int', 'float', 'str', 'bytes', 'date', 'datetime', 'NoneType', 'bool']) – The python data type of the parameter as a string.
as_binding (bool, optional) – Whether to format as a binding parameter (default is True).
as_alias (bool, optional) – Whether to include an alias for the parameter (default is False).
- Returns:
The parameter formatted for SQL type casting.
- Return type:
str
Note
This function provides consistent formatting for casting parameters into specific data types for SQLite, changing it will lead to unexpected behaviors.
Used by
SQLDataModel.__init__()withas_binding=Trueto allow parameterized inserts to cast to appropriate data type.
- Changelog:
- Version 2.3.2 (2026-01-23):
Modified handling for bytes dtype further to handle parsing values of mixed ASCII encoded bytes and escaped hexadecimal bytes.
- Version 2.3.0 (2026-01-21):
Added support for alternate bytes format X’<BYTES>’ so that binary data is correctly handled when formatted as hexadecimal and prefixed by either ‘b’ or ‘X’.
- Version 0.7.6 (2024-06-16):
Added support for additional date formats when
dtype='date'including:'%m/%d/%Y','%m-%d-%Y','%m.%d.%Y','%Y/%m/%d','%Y-%m-%d','%Y.%m.%d'.Modified behavior when
dtype='bytes'to avoid the need for any additional checks after insert.
- Version 0.3.3 (2024-04-03):
New method.
- sqldatamodel.utils.sqlite_printf_format(column: str, dtype: str, max_pad_width: int, float_precision: int = 4, alignment: str = None, escape_newline: bool = False, truncation_chars: str = '⠤⠄') str[source]
Formats SQLite SELECT clauses based on column parameters to provide preformatted fetches, providing most of the formatting for
reproutput.- Parameters:
column (str) – The name of the column.
dtype (str) – The data type of the column (‘float’, ‘int’, ‘bytes’, ‘index’, or ‘custom’).
max_pad_width (int) – The maximum width to pad the output.
float_precision (int, optional) – The precision for floating-point numbers (default is 4).
alignment (str, optional) – The alignment of the output (‘<’, ‘>’, or None for no alignment).
escape_newline (bool, optional) – If newline characters should be escaped when
dtype = 'str'. Default is False.truncation_chars (str, optional) – Truncation characters to use if column exceeds maximum width. Default is
'⠤⠄'.
- Returns:
The formatted SELECT clause for SQLite.
- Return type:
str
Note
This function generates SQLite SELECT clauses for single column only.
The output preformats SELECT result to fit
reprmethod for tabular output.The return
stris not valid SQL by itself, representing only the single column select portion.
- Changelog:
- Version 2.3.0 (2026-01-21):
Modified handling for bytes dtype to more closely align with conventional representation of X’<BYTES>’ instead of previous approach.
- Version 0.11.0 (2024-07-05):
Added
truncation_charskeyword argument to allow custom truncation characters when column value exceeds maximum width.
- Version 0.10.4 (2024-07-03):
Added
escape_newlinekeyword argument to escape newline characters to prevent wrapping lines when called bySQLDataModel.__repr__()
- Version 0.7.0 (2024-06-08):
Added preemptive check for custom flag to pass through string formatting directly to support horizontally centered repr changes.
- Version 0.1.9 (2024-03-19):
New method.