如何做到数据驱动开发时长“惊喜”骤减

Blog

Author:: Ben Herzberg
Published On:: 1月 21, 2021
Estimated reading time:: 1 minute

今天我们将讨论数据，并了解对数据潜力认知的转变过程。时间价值转换在不久前还是不切实际的幻想，如今已成为标准。

引用我最喜欢的小众电影《公主新娘》中的一段话作为开始。当然，我不会说 Inigo Montoya 的“你杀了我父亲，准备去死吧”，或者其他任何令人难忘的故事。我将引用奇迹创造者马克斯的话:“不要催我，孩子。创造奇迹的人不能慌张，不然就会出现很烂的奇迹。但是，你有钱吗?”

这也是多年来大量数据分析工作背后的指南。数据分析师和数据科学家兢兢业业地探索着如何从数据中提取价值，并努力创造他们的“奇迹”。但其中时间成本和经济成本都难以避免，建立庞大的数据库，或者高薪聘用数据科学家和分析师，这些都免不了大量的经济投入。

自从创造“奇迹”的工作开始后，数据分析领域发生了很多变化。

在过去几年中，数据开发日新月异。以史为鉴，可知兴衰，过去的经验也映射着将来的发展。

下面列举了一些数据的发展过程：

数十亿物联网设备从传感器发送数据；
大数据技术大规模改进，尤其是基于弹性云的大数据技术；
数据科学工具和数据库（如 SciPy 和 TensorFlow）的进步，让机器学习地实现也更为得心应手；
丰富的 BI 工具，帮助快速分析数据，减少代码编写和查询时间。

这意味着，曾经依赖单一的强大 DBA 团队组织并访问的数据库，现在有了更灵活的数据结构：

数据池中数据丰富多样；
大型数据库分析能力强大。

如今，在像 Redshift、BigQuery 和 Snowflake 这样的数据库中，企业和组织有很大的弹性发展空间，可以从无到有，快速成长为完整的数据库。

因此，数据分析和数据科学领域的“奇迹”创造者现在可以更快地创造奇迹。

数据潜力一旦被挖掘，企业也会愈发依赖甚至“上瘾”。许多公司可以轻而易举地将大量数据放入数据库，以供更多的内部团队使用。

这个发展是自然而然的过程。当企业尝到了数据驱动决策的甜头，营销和销售的方案得以优化，产品销量相应提升，他们也当然更希望将数据的力量运用到其他领域（如客户成功、运营、采购、人力资源等）。

时间生成价值，创造“惊喜”

每个人都在数据驱动价值的浪潮中扬帆起航，你的竞争对手亦是如此。

不要在非数据分析活动上浪费太多时间，对数据快速采取行动才至关重要。

既然挖掘数据潜力的想法已达成一致，我们在数据上花费的时间越长，越专注，获益的机会就越大。而且，不仅仅是收获利益，更能带来大大的惊喜。

惊喜，在这里指的是一个了意料之外的，超乎期待的价值。

下面的例子很好地解释了数据驱动带来的“惊喜”。

某游戏工作室的数据科学家发现，他们不仅可以将用户特征与其游戏化妆品的购买率联系起来，而且还可以找到方法，将不同的用户群与特定类别或颜色的化妆品（游戏内）进行匹配。

可以想象，对于这家游戏公司来说，优化转化瞬间变得轻而易举。

虽然企业已在数据存储库中存储了大量的数据，并且有合适的人员和工具来从数据中提取价值，但安全、法规和隐私仍是一些绊脚石。

数据可能包含敏感信息，因此存在内在风险。例如数据泄漏，以及不同的法规框架，我们需要根据这些框架进行调整和汇报。另外，我们还需要了解 PII（个人可识别信息）数据：这些数据在哪里，谁可以进行访问，谁曾访问过。

举个例子，假设我们的业务是一个大型多人游戏。我们从游戏服务器（以及丰富服务、网络分析等其他来源）获得了大量数据。我们想得出一套功能，用以预测哪些玩家将花费最多的钱在高档化妆品项目，并针对这群玩家制作相应的折扣券。

作为一家企业，我们愿意将大部分精力花在分析数据和创建预测算法上，从而将其他的时间成本降到最低。我们希望快速简单地解决数据访问、安全性和法规等问题。如果我们需要在 MMO 中引入软件更改，当然也不希望这个过程速度太慢，以致拖累整个进程。

因此，如果我们使用 Snowflake 库来分析从数据池和其他来源提取的数据，首先我们需要确保遵循Snowflake 的安全指南，并立即识别此项目的部分正在检索的敏感数据，并为不同的法规遵从性法规构建数据访问审计报告，保证不会中断“惊喜创建”的过程。

为了利用我们分析的数据，我们现在需要对软件进行调整，这需要快速而敏捷的开发周期。对于这样的快速迭代进程，尤其在大型代码库中，最明智的做法是确保减少构建时间，以便利用数据分析。因此可以通过提升计算能力，分配构建周期来提升构建速度。

软件开发速度是重中之重

为了避免数据处理造成一些附带损害，除了要注意安全性，速度也至关重要。

大多数公司都承担不了缓慢的软件开发进程，因为这会让他们无法按时使用数据。

快速和敏捷的开发周期必不可少，但是对于所有这些数据（这意味着一个大的代码库），编译时间可能会延长。准确来说，是浪费在等待上的时间更长，真正花在数据上的时间却很少。幸运地是，有先端的技术解决这个问题。分布式处理技术，可以利用其他计算机 CPU 的能力来减少构建时间。

有时魔术，不过是藏在衣袖里的另一个小招数。进行数据分析，分布式处理技术是可以帮助创造惊喜的又一妙招。

一旦能够实现真正敏捷的数据驱动创造价值，并将其与快速软件部署相匹配，我们可以为客户带来他们应得的“惊喜”！

作者简介：Ben Herzberg 是 Satori 的首席科学家。

Ben Herzberg

The author is the Chief Scientist of Satori.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_147093399_1	1 minute	Set by Google to distinguish users.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
BAIDUID	1 year	Baidu installs this cookie to store analytical data like number of sessions, time spent on the page, bounce rate, the device used, etc.
utm_campaign	2 months	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	2 months	This cookie is used for storing the session content value if present.
utm_source	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.

Cookie	Duration	Description
AGL_USER_ID	7977 years 6 months 22 days 13 hours	No description available.
BIGipServersn-mch-v2-80	session	No description
BIGipServersn02web-nginx-app_https	session	No description
Hm_ck_1654686534484	session	No description
Hm_ck_1654686545903	session	No description
Hm_ck_1654686785317	session	No description
Hm_ck_1654686803939	session	No description
Hm_ck_1654686830687	session	No description
Hm_ck_1654686905307	session	No description
Hm_lpvt_08824d287f65a57bc02536f25f8be026	session	No description
Hm_lvt_08824d287f65a57bc02536f25f8be026	1 year	No description
HMACCOUNT	15 years 7 months 10 days 13 hours	This cookie is set by the provider Baidu. This cookie is used to send data about visitor device and behaviour to Baidu. It helps in tracking the visitor across devices.
HMACCOUNT_BFESS	15 years 7 months 10 days 13 hours	No description available.
ib_last_referrer	2 months	No description
incap_ses_873_2167377	session	No description
nlbi_2167377	session	No description
referrer66_00f	1 month	No description
SESSb11a5778793f573778d3e2b21c7f1e0a	session	No description
visid_incap_2167377	1 year	No description
visitorId	1 year	No description

如何做到数据驱动开发时长“惊喜”骤减

惊喜，在这里指的是一个了意料之外的，超乎期待的价值。

Ben Herzberg

订阅博客

Related Posts

工业 4.0 如何引领另一场工业革命

Yalla DevOps 2021 论坛精华

将左移应用到发布管理的其他领域