A Better Approach to Metadata With bb | by Matt Simmons

bb is a model new and obscure language for data serialisation (like yaml or json). Not like all completely different languages, it allows you to define your private varieties, allowing you to retailer the an identical information in far fewer characters.

The core thought of bb is to create a 1-to-1 relationship between the raw data and the information it represents: compress the data, nevertheless protect it human readable.

JSON = information + fluff + repeated information
bb = information

As an illustration, take into consideration we’re receiving a stream of event data:

JSON would appear like:

[
{"type": "trade", "action": "sell", "price": 353, "volume": 4}, 
{"type": "trade", "action": "buy", "price": 354, "volume": 12}, 
{"type": "trade", "action": "sell", "price": 360, "volume": 10}
]

The equal bb would appear like:

4ts@353 12tb@354 10ts@360

All I need to do is pre-define the type “commerce” like so:

t = {type: commerce, s: is_sell, b: is_buy, @: price}

The bb program converts this to JSON. The data look barely utterly completely different, nevertheless the information encoded is a similar.

[
{"type":"trade","quantity":4,"is_sell":true,"price":353},
{"type":"trade","quantity":12,"is_buy":true,"price":354},
{"type":"trade","quantity":10,"is_sell":true,"price":360}
]

You’ll have the ability to do this out your self throughout the bb playground, or once you use Python:

# requires: pip arrange bb-python
import bb
data = bb.convert("""
t = {type: commerce, s: is_sell, b: is_buy, @: price}
4ts@353 12tb@354 10ts@360
""")

bb makes information smaller, which provides you the ability in order so as to add information wherever you need it, and type it out in a fraction of the time.

Programmers are already used to together with small portions of knowledge everywhere with suggestions of their code, nevertheless these are merely textual content material, not useful data. bb presents you the ability to encode way more information into your suggestions after which extract it from them as JSON.

Use case: SQL Metadata

What would happen if we started together with small portions of bb to our code base, for example, all by way of a ML prediction service…

We’d define assessments for each operate and enterprise logic used all within the an identical script:

/*bb
md`# Product Choices Script
Generates choices referring to individal merchandise.`trip spot:dataset.table_2
creator:Matt
jira:PROJ-93 jira:PROJ-94
*/
SELECT
id, --bb col"id: Distinctive ID for the store from the xyz system" testUnique:id testNotNull:id
AVG(price) AS mean_price, --bb feat"mean_price: Widespread price of merchandise purchased at this retailer in GBP"
SUM(product sales) AS total_sales, --bb feat"total_sales: Number of product sales at this retailer in a 6 month window"
AVG(revenue) AS mean_revenue --bb feat"mean_revenue: Widespread revenue in a 6 month window"
--bb testNotNull:mean_revenue
--bb check out"SELECT * FROM dataset.table_2 WHERE mean_store_revenue < 0"
FROM
`dataset.desk`
WHERE 
product_type != "check out" --bb logic"Don't use check out merchandise on account of they don't seem to be precise"
GROUP BY 
store_id

The bb presents us the following information:

Column descriptions
The establish and description of a operate, and the place it is going to get outlined
The enterprise logic that’s being utilized and the place it is going to get utilized
The operate(s) that are affected by the enterprise logic
All the documentation referring to this data pipeline stage as markdown
The establish of the dataset and desk that this script creates
The establish of the builders who wrote the script
The IDs of the Jira tickets that relate to this script
Which expectation assessments we have to run

That’s a complete lot of metadata, and all of that’s saved all through the an identical file, which is now the single provide of reality.

What can we do with this information?

Create an index of choices, observe which ones we’ve used, and easily uncover their definitions as soon as we have to edit them
Populate topic and desk descriptions in Snowflake/BigQuery/wherever
Automate the creation of a easy to navigate documentation website. Add hyperlinks to the script and the Jira ticket to the docs.

Most of these useful metadata are data that no particular person would have bothered to jot down down down if we wanted to go to the difficulty of amassing it by a kind or entering into it proper right into a spreadsheet. It’s moreover harder to miss to exchange a topic description when someone changes the script, as a result of it’s correct in entrance of you.

How do you extract that metadata from all of those suggestions? Easy:

bb -i choices.sql

Or once you’re using the Python shopper:

# requires: pip arrange bb-python
import bb
bb.extract(open("choices.sql").study())

Try it out proper this second: github.com/MattSimmons1/bb

Source link

A Better Approach to Metadata With bb | by Matt Simmons | Jan, 2025

A four-pack of Apple AirTags has dropped to $70

📈 Predicting Google Stock Prices with Kernel Regression and Interactive Widgets! 🚀 | by Unicorn Day | Jul, 2024 – Niraranra

Vasco Translator E1: Real-Time Translating Earbuds

📈 Predicting Google Stock Prices with Kernel Regression and Interactive Widgets! 🚀 | by Unicorn Day | Jul, 2024 – Niraranra

Monday January 6, 2025 – Dogster – Nirantara

Plug-In Hybrids Get a Reboot as All-Electric EV Sales Stall

Leave A Reply Cancel Reply

A four-pack of Apple AirTags has dropped to $70

They Give a Cat a Safe Place to Stay and Later Find Little Kittens in Their House

📈 Predicting Google Stock Prices with Kernel Regression and Interactive Widgets! 🚀 | by Unicorn Day | Jul, 2024 – Niraranra

Vasco Translator E1: Real-Time Translating Earbuds

📈 Predicting Google Stock Prices with Kernel Regression and Interactive Widgets! 🚀 | by Unicorn Day | Jul, 2024 – Niraranra

Our Picks

A four-pack of Apple AirTags has dropped to $70

They Give a Cat a Safe Place to Stay and Later Find Little Kittens in Their House

📈 Predicting Google Stock Prices with Kernel Regression and Interactive Widgets! 🚀 | by Unicorn Day | Jul, 2024 – Niraranra

A Better Approach to Metadata With bb | by Matt Simmons | Jan, 2025

Use case: SQL Metadata

Related Posts

Leave A Reply Cancel Reply