Quickbooks 2018 desktop hierarchical list view default

11/13/2022

So I’m going to create a string first that will define all the columns where I want to find co-occurrence. BigQuery ML does a good job of hot-encoding strings, but it doesn’t handle arrays as I wish it did (stay tuned). WHERE tag1 IN (SELECT tag FROM active_tags)ĪND tag2 IN (SELECT tag FROM active_tags) SELECT *, MAX(questions) OVER(PARTITION BY tag1) questions_tag1įROM data, UNNEST(SPLIT(tags, '|')) tag1, UNNEST(SPLIT(tags, '|')) tag2 SELECT *, questions/questions_tag1 percent So I’ll take these relationships and I’ll save them on an auxiliary table - plus a percentage of how frequently a relationship happens for each tag.ĬREATE OR REPLACE TABLE `deleting.stack_overflow_tag_co_ocurrence`įROM `fh-bigquery.stackoverflow_archive.201906_posts_questions` Let’s find tags that usually go together:Ĭo-occurring tags on Stack Overflow questions ORDER BY 2 DESC Top Stack Overflow tags by number of questions. In this picture I only have 240 tags - how would you group and categorize 4,000+ of them?įROM `fh-bigquery.stackoverflow_archive.201906_posts_questions`, These are the most active Stack Overflow tags since 2018 - they’re a lot. You can check out more about working with Stack Overflow data and BigQuery here and here. In this post he works with BigQuery – Google’s serverless data warehouse – to run k-means clustering over Stack Overflow’s published dataset, which is refreshed and uploaded to Google’s Cloud once a quarter. Felipe Hoffa is a Developer Advocate for Google Cloud.

Visualizing a universe of clustered tags.

How would you group more than 4,000 active Stack Overflow tags into meaningful groups? This is a perfect task for unsupervised learning and k-means clustering - and now you can do all this inside BigQuery.

0 Comments

Quickbooks 2018 desktop hierarchical list view default

Leave a Reply.

Author

Archives

Categories